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(57) Abstract 

The present invention provides a simplified method for identifying differences in nucleic acid abundances (e.g., expression levels) 
between two or more samples. The methods involve providing an array containing a large number (e.g. greater then 1,000) of arbitrarily 
selected different oligonucleotide probes where the sequence and location of each different probe is known. Nucleic acid samples (e.g. 
mRNA) from two or more samples are hybridized to the probe arrays and the patterri of hybridization is detected. Differences in the 
hybridization patterns between the samples indicates differences in expression of various genes between those samples. This invention also 
provides a method of end-labeling a nucleic acid. In one embodiment, the method involves providing a nucleic acid, providing a labeled 
oligonucleotide and then cnzymatically ligating the oligonucleotide to the nucleic acid. Thus, for example, where the nucleic acid is an 
RNA, a labeled oligoribonuclcotidc can be ligatcd using an RNA ligasc. In another embodiment, the end labeling can be accomplished by 
providing a nucleic acid, providing labeled nucleoside triphosphates, and attaching the nucleoside triphosphates to the nucleic acid using a 
terminal transferase. 
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NUCLEIC ACm ANALYSIS TECIIMQLIES 

5 CROSS REFERENCE TO RELATED APPLICATIONS 

This is a coatinuation-in-part of U. S.S.N. 60/010,471 filed on Januan' 23. 
1996 and a continuation-in-part of provisional patent application for "Labeling of Nucleic 
Acids" naming Lockhart, Cronin, Lee, Iran, Matsuzaki, McGall and Barone as inventors, 
filed on January 9, 1997, both of which are herein incorporated by reference for all 
10 purposes, 

BACKGROUND OF THE INVENTION 

A portion of the disclosure of this patent document contains material which 
is subject to copyright protection. The copyright owner has no objection to the xerographic 

15 reproduction by anyone of the patent document or the patent disclosure in exactly the form 
it appears in the Patent and Trademark Office patent file or records, but otherwise reserves 
all copyright rights whatsoever. 

Many disease states are characterized by differences m mc expression levels 
of various genes either through changes in the copy number of the genetic DN A or through 

20 changes in levels of transcription (e.g. through control of initiation, provision of RNA 
precursors, RNA processing, etc.) of particular genes. For example, losses and gains of 
genetic material play an important role in malignant transformation and progression. 
These gains and losses are thought to be "driven" by at least two kinds of genes. 
Oncogenes are positive regulators of tumorigenesis, while tumor suppressor genes are 

25 negative regulators of tumorigenesis (Marshall, Cell, 64: 313-326 (1991); Weinberg, 

Science. 254: i 138-1 146 (1991)). Therefore, one m-echanism of activating unregulated 
i^ui wth 15 to increase the nu?i*bt^' of genes coding fu: Oiicogenc protr^ns r,r t;; increase the 
level of expression of uicse Oncogenes (e.g. in response to ceDuiar or environmental 
changes), and aiiothcr is to lose genetic ni?tenai or to decrease the level of expression of 

30 Renes iiiai cuuC for ra.v.cr cuppressors Thi^ made! is supported by the losses and gams ot 
geiietic material associated v^*H glioma progression (Mikkelson et al J. Cell Biockem. 
46: 3-8 (1991)). Thus, changes in the expression (transcription) levels of particular genes 
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{e.g, oncogenes or tumor suppressors), ser\'e as signposts for the presence and progression 
of various cancers. 

Similarly, control of the cell cycle and cell development, as well as diseases, 
are characterized by the variations in the transcription levels of particular genes. Thus, for 
5 example, a viral infection is often characterized by the elevated expression of genes of the 
particular virus. For example, outbreaks of Herpes simplex. Epstein-Barr virus infect!or> 
(e.g. infectious mononucleosis), cytomegalovirus, Varicella-zoster virus infections, 
parvovirus infections, human papillomavirus infections, etc, are all characterized by 
elevated expression of various genes present in the respective virus. Detection of elevated 

10 expression levels of characteristic viral genes provides an effective diagnostic of the 
disease state. In particular, viruses such as herpes simplex, enter quiescent states for 
periods of time only to erupt in brief periods of rapid replication. Detection of expression 
levels of characteristic viral genes allows detection of such active proliferative (and 
presumably infective) states. 

15 The use of "traditional" hybridization protocols for monitoring or 

quantifying gene expression is problematic. For example two or more gene products of 
lipprovirnately the same molecular weight will prove difficult or impossible to distinguish 
in a Northern blot because they are not readily separated by elecu-ophoretic methods. 
Similarly, as hybridization efficiency and cross-reactivity varies with the particular 

20 subsequence (region) of a gene being probed it is difficult to obtain an accurate and reliable 
measure of gene expression v^th one, or even a few, probes to the target gene. 

The development of VLSIPS™ technology provided methods for 
synthesizing arrays of many different oligonucleotide probes that occupy a very small 
surface area. See U.S. Patent No. 5,143,854 and PCT patent publication No. WO 

25 90/1 5070^ U.S. Patent application Serial No. 082,937, filed June 25, 1993, describes 
methods for iiiokiiig arrays of cIiscnuc!eoti(«<^ probes that can be used to provide the 
rr^m-iste ^uu^tiiie of^tzrzzt nucleic acio lo detect the presence of a nucleic aciu 
containing a specific nucleotide sequence. 

Pr^vioiis methods ot measuring nucleic acid abundance differences or 

30 ciiangcs in the expression vanous genes (e.g.. difreicimal diaplay, SAGE, cDNA 

sequencing, clone spotting, etc.) require assumpdoii:. about, cr prior knowledge regarrbng 
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the target sequences in order to design appropriate sequence-specific probes. Other 
methods, such as subtractive hybridization, do not require prior sequence knowledge, but 
also do not directly provide sequence information regarding differentially expressed 
nucleic acids. 

5 

Summary of the Inven tion 
The present invention, in one embodiment, provides methods of monitoring 
the expression of a multiplicity of preselected genes (referred to herein as "expression 
monitoring"). In another embodiment this invention provides a way of identifying 

10 differences in the compositions of two or more nucleic acid {e.g., RNA or DNA) samples. 
Where the nucleic acid abundances reflect expression levels in biological samples from 
which the samples are derived, the invention provides a method for identifying differences 
in expression profiles bewteen two or more samples. These "generic difference screening 
methods" are rapid, simple to apply, require no a priori assumptions regarding the 

15 particular sequences whose expression may differ between the two samples, and provide 
direct sequence information regarding the nucleic acids whose abundances differ between 
the samples. 

In one embodiment, this invention provides a method of identifying 
differences in nucleic acid levels between two or more nucleic acid samples. The method 
20 involves the steps of (a) providing one or more oligonucleotide arrays said arrays 

comprising probe oligonucleotides attached to a surface; (b) hybridizing said nucleic acid 
samples to said one or more arrays to form hybrid duplexes between nucleic acids in said 
nucleic acid samples and probe oligonucleotides in said one or more arrays that are 
complementary to said nucleic acids or subsequences thereof;(c) contacting said one or 
25 more arrays with a nucleic acid ligase; and (d) determining differences in hybridization 
id nucleic acid samples wherein said differences in hybridization indicate 
difference^ i»* smd nucJcic acid leveis. 

in another embuuluiwritj the method cf identifying d^+^erences in nucleic 
acid levels between two ui more nucleic acid samples involves the steps of: (a) prcvidmg 
30 one or more oiigonucleuu Jc arrays comprising pr??be oHoonucleotides wherein said probe 
oiigonucieotiues comprise a constant region and a variable region, (b) liybriuizing said 
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nucleic acid samples to said one or more arrays to form hybrid duplexes between nucleic 
acids in said nucleic acid samples and said variable regions that are complementar\' to said 
nucleic acids or subsequences thereof; and (c) determining differences in hybridization 
between said nucleic acid samples wherein said differences in hybridization indicate 
5 differences in said nucleic acid levels. 

In ye! another embodiment, the method of identifying differences in nucleic 
acid levels between two or more nucleic acid samples involves the steps of: (a) providing 
one or more high density oligonucleotide arrays; (b) hybridizing said nucleic acid samples 
to said one or more arrays to form hybrid duplexes between nucleic acids in said nucleic 

10 acid samples and probe oligonucleotides in said one or more arrays that are complementary 
to said nucleic acids or subsequences thereof; and (c) determining the differences in 
hybridization between said nucleic acid samples wherein said differences in hybridization 
indicate differences in said nucleic acid levels. 

In still yet another embodiment, the method of identify ing differences in 

15 nucleic acid levels between two or more nucleic acid samples involves the steps of: (a) 
providing one or more oligonucleotide arrays each comprising probe oligonucleotides 
wherein said probe oligonucleotides are not chosen to hybridize to nucleic acids derived 
from particular preselected genes or mRNAs; (b) hybridizing said nucleic acid samples to 
said one or more arrays to form hybrid duplexes between nucleic acids in said nucleic acid 

20 samples and probe oligonucleotides in said one or more arrays that are complementary to 
said nucleic acids or subsequences thereof; and (d) determining differences in 
hybridization between said nucleic acid samples wherein said differences in hybridization 
indicate differences in said nucleic acid levels. 

In another embodiment, the methods of identifying differences in nucleic 

25 acid levels between two or more nucleic acid samples involves the steps of: (a) providing 
one oi iiioie oligonucleotide arr:iys each compf^«;»?^2 prob?: oiigonucieotides wherein said 
nrobe *jli*4»_v't^'^*cr»Mdc3 ccmcrise a nucieotidf ^enii^rtgt^s t»i subsequences selected 
according to a process selected from the group consistmg of a random selection, a 

cleotide composition biased selection, and all possible 

30 cligcnucleotides cf a pres**lf*rted length; (b) nybridizing said nucleic acid aoiuplca lu said 
one or more arrays to form hybrid duplexes between nucleic acids in said nucleic acid 
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samples and probe oligonucleotides in said one or more arrays that are complementary' to 
said nucleic acids or subsequences thereof; and (c) determining differences in hybridization 
betu'een said nucleic acid samples wherein said differences in hybridization indicate 
differences in said nucleic acid levels. 

5 In another embodiment, the methods of identifying differences in nucleic 

acid levels between two or more nucleic acid samples involve the steps of: (a) 
providing one or more oligonucleotide arrays each comprising probe oligonucleotides 
wherein said probe oligonucleotides comprise a nucleotide sequence or subsequences 
selected according to a process selected from the group consisting of a random selection, a 

10 haphazard selection, a nucleotide composition biased selection, and all possible 

oligonucleotides of a preselected length; (b) providing software describing the location and 
sequence of probe oligonucleotides on said array; (c) hybridizing said nucleic acid samples 
to said one or more arrays to form hybrid duplexes between nucleic acids in said nucleic 
acid samples and probe oligonucleotides in said one or more arrays that are complementary 

15 to said nucleic acids or subsequences thereof; and (d) operating said software such that said 
hybridizing indicates differences in said nucleic acid levels. 

This invention also provides methods of simultaneously monitoring the 
expression of a multiplicity of genes. In one embodiment these methods mvoive ^a) 
providing a pool of target nucleic acids comprising RNA transcripts of one or more of said 

20 genes, or nucleic acids derived from said RNA transcripts; (b) hybridizing said pool of 
nucleic acids to an oligonucleotide array comprising probe oligonucleotides immobilized 
on a surface; (c) contacting said oligonucleotide array with a ligase; and (d) quantifying the 
hybridization of said nucleic acids to said array wherein said quantifying provides a 
measure of the levels of transcription of said genes. 

25 Still yet another method of identifying differences in nucleic acid levels 

between nvo or more nucleic acid samples involves the steps of: (a) providing one or more 
arrays of ont:<»^*"^i^^^>"^i^s each arr^iy couiuvising pairs oi v^abe o:igon"r?rr;t:des where thr 
members of each pair of probe uugonuclcctidcG differ from each other in preseiecled 
nucleoudes; (b) hybridi/^ng said nucleic acid samples to said one or mere arrays to form 

30 hybnd aupiexes beiwci:,ii nucleic acids in s?\^ ";:rleic acid samples and probe 

oiigonucieotiuws in said one or more arrays that are complementary to said nucleic acids or 
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subsequences thereof; (c) determining the differences in hybridization between said nucleic 
acid samples wherein said differences in hybridization indicate differences in said nucleic 

acid levels. 

Another method of simultaneously monitoring the expression of a 
5 multiplicity of genes, involves the steps of: (a) providing one or more oligonucleotide 

arrays comprising probe oligonucleotides wherein said probe oligonucleotides comprise a 
constant region and a variable region; (b) providing a pool of target nucleic acids 
comprising RNA transcripts of one or more of said genes, or nucleic acids derived from 
said RNA transcripts; (c) hybridizing said pool of nucleic acids to an array of 

10 oligonucleotide probes immobilized on a surface; and (d) quantifying the hybridization of 
said nucleic acids to said array wherein said quantifying provides a measure of the levels of 
transcription of said genes. 

This invention additionally provides methods of making a nucleic acid array 
for identifying differences in nucleic acid levels between two or more nucleic acid 

15 samples. In one embodiment the method involves thesteps of: (a) providing an 
oligonucleotide array comprising probe oligonucleotides wherein said probe 
oligonucleotides comprise a constant region and a variable region; (b) hybridizing one or 
more of said nucleic acid samples to said arrays to form hybrid duplexes of said variable 
region and nucleic acids in said nucleic acid samples comprising subsequences 

20 complementary to said variable region; (c) attaching the sample nucleic acids comprising 
said hybrid duplexes to said array of probe oligonucleotides; and (d) removing unattached 
nucleic acids to provide a high density oligonucleotide array bearing sample nucleic acids 
attached to said array. 

In another embodiment the method of making a nucleic acid array for 

25 identifying differences in nucleic acid levels between two or more nucleic acid samples, 
involves the steps of: (a) providing a high density' ^rray; (b) cos^tnrtina said array one or 
more of s^i'J tvvo or i^»orc r.ucicic ncid Esrr.rles whereby ni»r<eH: hcuis of said one of said 
two or more nucleic acid samples form hybrid duplexes with probe oligonucleotiaes in said 
arrays; (c) attarhi^s s^nnnle nucleic acids compnsmg said hybrid duplexes to said arra>' 

30 of probe oligcnucieotides; ?nd (d) removing unattached nucleic acids to provide a iugh 
density oligonucleotide array bearing sample nucleic acids aitiiched to said array. 
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This invention additionally provides kits for practice of the methods 
described herein. One kit comprises a container containing one or more oligonucleotide 
arrays said arrays comprising probe oiigonucleotides attached to a surface; and a container 
containing a ligase. /\nolhcr kit compnses a container containmg one or more 
5 oligonucleotide arrays said arrays comprising probe oiigonucleotides wherein said probe 
oligonucleotides comprise a constant region and a variable region. This kit optionally 
includes a constant oligonucletide complementary to said constant region or a subsequence 
thereof 

Preferred high density oligonucleotide arrays of this invention comprise 
10 more than 100 different probe oligonucleotides wherein: each different probe 

oligonucleotide is localized in a predetermined region of the array; each different probe 
oligonucleotide is attached to a surface through a terminal covalent bond; and the density 
of said probe different oligonucleotides is greater than about 60 different oligonucleotides 
per 1 cm^ The high density arrays can be used in all of the array-based methods discussed 
15 herein. High density arrays used for expressio monitoring will typically include 

oligonucleotide probes selected to be complementary to a nucleic acid derived from one or 
more preselected genes. In contrast, generic difference screening arrays may contain probe 
oligonucleotides selected randomly, haphazardly, arbitrarily, or including sequences or 
subsequences comprising all possible nucleic acid sequences of a particular (preselected) 
20 length. 

In a preferred embodiment, pools of oligonucleotides or oligonucleotide 
subsequences comprising all possible nucleic acids of a particular length are selected from 
the group consisting of all possible 6 mers, all possible 7 mers, all possible 8 mers, all 
possible 9 mers, all possible 10 mers, all possible 1 1 mers, and all possible 12 mers 
25 This invention also provides methods of labeling a nucleic acid. In one 

emhodiment. this method involves the steps of: (a) providing a nucleic acid; (b) 
Hmpljifyinig said Huclcic acid to fonn «»;*p'iit»«i>^ (c; fragmcriting said u;:ip::cu;ib :0 fo::V: 
fragments ot said ampiicons, and (u) cGupling a labeled mcietj' to at least one of sa<d 
fragments. 

30 In another cn:bcd:ment, the niethor^*: ^Tivoivf^ the .steps of: Cal providina a 

nucleic acid; (b) transcribing s^id nnnif ir ^nd to tormed a transcribed nucleic acid, (c) 
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fragmenting said transcribed nucleic acid to fonn fragments of said transcribed nucleic 
acid; and (d) coupling a labeled moiety to at least one of said fragments. 

In yet another embodiment, the methods involve the steps of: (a) providing 
at least one nucieic acid coupled to a support; (b) providing a labeled moiety capable of 
5 being coupled with a terminal transferase to said nucieic acid; (c) providing said terminal 
transferase; and (d) coupling said labeled moiety to said nucleic acjd using said terminal 
transferase. 

In still another embodiment, the methods involve the steps of: (a) providing 
at least two nucieic acids coupled to a support; (b) increasing the number of monomer units 
10 of said nucleic acids to form a common nucleic acid tail on said at least tvv^o nucleic acids; 

(c) providing a labeled moiety capable of recognizing said common nucleic acid tails; and 

(d) contacting said common nucleic acid tails and said labeled moiety. 

In still yet another embodiment, the methods involve the steps of: (a) 
providing at least one nucleic acid coupled to a support; (b) providing a labeled moiety 
15 capable of being coupled with a ligase to said nucleic acid; (c) providing said ligase; and 
(d) coupling said labeled moiety to said nucleic acid using said ligase. 

tk;c invention also provides compounds of the formulas described herein. 

Definitions. 

20 An array of oligonucleotides as used herein refers to a multiplicity of 

different (sequence) oligonucleotides attached (preferably through a single terminal 
covalent bond) to one or more solid supports where, when there is a multiplicity of 
supports, each support bears a multiplicity of oligonucleotides. The term "array" can refer 
to the entire collection of oligonucleotides on the support(s) or to a subset thereof. The 

25 term "same array" when used to refer to two or more arrays is used to mean arrays that 
Hnvv-, 5;ubstanU?>iiy the same uli^Ouuclcotide species Lhet^enn in substantially the same 

A : — of the ciic:eniic*'*^*ide soecies may differ betwett»t the 

two arrays, but, in a preferred embodiment, it is substantially the same, it is ^ecugviizcd 
mat even wlic*w r.vc airayf: z^^ ^^^^^^ed and svnthesized to be identical there arc vanations 

30 in the ahundaxice, cOuipo3:tion, 2-nd HistHhutmn ot oiisonucleotide 
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variations are preferably insubstantiai and/or compensated for by the use of controls as 

described herein. 

The phrase ^'massively parallel screening" refers to the simultaneous 
screening of at least about 100, preferably about 1000, more preferably aboul 10,000 ar.d 

5 most preferably about 1 ,000,000 different nucleic acid hybridixations. 

The terms "nucleic acid'* or "nucleic acid molecule'' refer to a 
deoxyribonucleotide or ribonucleotide polymer in either single-or double-stranded form, 
and unless otherwise limited, would encompass known analogs of natural nucleotides that 
can function in a similar manner as naturally occurring nucleotides. 

IQ An oligonucleotide is a single-stranded nucleic acid ranging in length from 

2 to about 1000 nucleotides, more typically from 2 to about 500 nucleotides in length. 

As used herein a "probe" is defined as an oligonucleotide capable of binding 
to a target nucleic acid of complementary sequence through one or more types of chemical 
bonds, usually through complementary base pairing, usually through hydrogen bond 

15 formation. As used herein, an oligonucleotide probe may include natural (i.e. A, G, C, or 
T) or modified bases (7-deazaguanosine, inosine, etc.). In addition, the bases in 
oligonucleotide probe may be joined by a linkage other than a phosphodiester bond, so 
long as it does not interfere with hybridization. Thus, oligonucleotide probes may be 
peptide nucleic acids in which the constituent bases are joined by peptide bonds rather than 

20 phosphodiester linkages. 

The term "target nucleic acid" refers to a nucleic acid (often derived from a 
biological sample and hence referred to also as a sample nucleic acid), to which the 
oligonucleotide probe specifically hybridizes. It is recognized that the target nucleic acids 
can be derived from essentially any source of nucleic acids {e.g., including, but not limited 

25 to chemical syntheses, amplification reactions, forensic samples, etc.) It is either the 
presence or absence of one or more target nucleic acids that is to be detected, or the 
arnoimi; of oiie or more target nucleic acid's t»Mt i> to quanui:ed. i lit taif;et riizclc-c 
acid{s) that are detected prefer enilally liavc nucleotide sequences Lhat ?rp complementary 
to the nucleic acid se4ueriCC3 of the corresponding probe(s) io which they specifically bind 

^0 ( ny bridize). Tiie ttiui wigct nucleic acid may refer tn the f^nscific subsequence of a larger 
nucleic acid to which the probe specifically hybndizes. or to the overall sequeiice {e.g., 
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gene or mRNA) whose abundance (concentration) and/or expression level it is desired to 
detect. The difference in usage will be apparent from context. 

A "ligatable oligonucleotide" or "lij^atable probe" or "ligatable 
oligonucleotide probe" refers to an oligonucleotide that is capable of being iigated to 
5 another oligonucleotide by the use of a ligase {e.g., T4 DNA ligase). 'i he ligatable 
oligonucleotide is preferably a decxjTibonucleotide. The nucleotides comprising the 
ligatable oligonucleotide are preferably the "standard" nucleotides; A, G, C, and Tor U. 
However derivatized, modified, or alternative nucleotides {e.g., inosine) can be present as 
long as their presence does not interfere with the ligation reaction. The ligatable probe 

10 may be labeled or otherwise modified as long as the label does not interfere with the 
ligation reaction. Similarly the iniemucleotide linkages can be modified as long as the 
modification does not interfere with ligation. Thus, in some instances, the ligatable 
oligonucleotide can be a peptide nucleic acid. 

"Subsequence" refers to a sequence of nucleic acids that comprises a part of 

15 a longer sequence of nucleic acids. 

A "wobble" refers to a degeneracy at a particular position in an 
cligcrz-clsetide A. iv^y def f nerate or "4 wav" wobble refers to a collection of nucleic 
acids (e.g., oligonucleotide probes having A, G, C, or T for DNA or A, G, C, or U for RNA 
at the wobble position.) A wobble may be approximated by the replacement of the 

20 nucleotide with inosine which will base pair with A, G, C, or T or U. Typically 

oligonucleotides containing a fully degenerate wobble produced during chemical synthesis 
of an oligonucleotide is prepared by using a mixture of four different nucleotide monomers 
at the particular coupling step in which the wobble is to be introduced. 

The term"cross-linking" when used in reference to cross-linking nucleic 

25 acids refers to attaching nucleic acids such that they are not separated under typical 

conditions tbM are used to denaLiae complcrricritar)' nucleic acid sequences. Cross!ir»Jkiy>£ 
prerwr*ibl} :::vO:':*cs T^o foTTTiaiicri 'jf C'jv^tTit v"?''Cr.~c:^ hetVrwCr. t)^e nucleic scids. Methods 
of cross-linking nucleic acids are described herein. 

The phrace "covp****^ ^ cimnnrt" means bound directly or indirectly 

30 thereto inciuuing aluiclimcnt by ccvalent binding, hyHmaen hnnrtmg. ionic interacuon, 
hydrophobic interaction, or otherwise. 
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"Amplicons" are the products of the amplification of nucleic acids by PCR 

or other\\'ise. 

'TranscribLng a nucleic acid'' means the formation of a ribonucleic acid 
from a deoxyribonucleic acid and the converse (the farmation of a decxyrlbonucleic acid 

5 from a ribonucleic acid). A nucleic acid can be transcribed by DN A-dependent RNA 
poivTnerase, reverse transcriptase, or othervtise. 

A labeled moiety means a moiety capable of being detected by the various 
methods discussed herein or known in the art. 

The term "complexity "is used here according to standard meaning of this 

10 term as established by Britten et al Methods of EnzymoL 29:363 (1974). See, also Cantor 
andSchimmel Biophysical Chemistry: Part III at 1228-1230 for further explanation of 
nucleic acid complexity. 

"Bind(s) substantially" refers to complementary hybridization between a 
probe nucleic acid and a target nucleic acid and embraces minor mismatches that can be 

15 accommodated by reducing the stringency of the hybridization media to achieve the 
desired detection of the target polynucleotide sequence. 

The phrase "hybridizing specifically to", refers to the binding, duplexing, or 
hybridizing of a molecule preferentially to a particular nucleotide sequence under stringent 
conditions when that sequence is present in a complex mixture {e.g., total cellular) DNA or 

20 RNA. The term "stringent conditions" refers to conditions under which a probe will 

hybridize preferrentiaily to its target subsequence, and to a lesser extent to, or not at all to, 
other sequences. Stringent conditions are sequence-dependent and will be different in 
different circumstances. Longer sequences hybridize specifically at higher temperatures. 
Generally, stringent conditions are selected to be about S'^C lower than the thermal melting 

25 jK)int (T„) for the specific sequence at a defined ionic strength and pH. The T„ is the 
temperature (under defined ionic strength, pH, and nucleic acid concentration) at which 
S0% of the probes compiemerK>i^ v to the uirgct :^xi^:t^ hybridize t;:rset sequence at 
equilibrium. (As the i^^ct sequences are ztntr^Wy p^esem in excess, at T^. 50yb oi die 
probes are occupied at equilibrium). Typic^fK'^ stringent conditions will be those in which 

3U the sail coiiufciiuation is at \z?sX abo"^ n 01 tn 1 0 M Na ion concentration (or other salts) at 
pl\ 7.0 to 8.2 and the tempf*ratiire is at least about 30"C for shon probes (e.g., 10 to 50 
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nucleotides). Stringent conditions may also be achieved wth the addition of destabilizing 
agents such as formamide. 

The term "peifect match probe*' refers to a probe that has a sequence tha: is 
perfectly compiementary to a particular target sequence. The test probe is typically 

5 perfectly compiementary to a poition (subsequence) of the target sequence. The perfect 
match (PM) probe can be a "test probe", a ''normalization controF' probe, an expression 
level control probe and the like. A perfect match control or perfect match probe is, 
however, distinguished from a ''mismatch control" or "mismatch probe." In the case of 
expression monitoring arrays, perfect match probes are typically preselected (designed) to 

10 be complementary to particular sequences or subsequences of target nucleic acids (e.g., 
particular genes). In contrast, in generic difference screening arrays, the particular target 
sequences are typically unknown. In the latter case, prefect match probes cannot be 
preselected. The term perfect match probe in this context is to distinguish that probe from 
a corresponding "mismatch control" that differs from the perfect match in one or more 

15 particular preselected nucleotides as described below. 

The term "mismatch control" or "mismatch probe", in expression 
refers to probes whose sequence is deliberately selected not to be 
perfectly complementary to a particular target sequence, For each mismatch (MM) control 
in a high-density array there preferably exists a corresponding perfect match (PM) probe 

20 that is perfectly complementary to the same particular target sequence. In "generic" (e.g., 
random, arbitrary, haphazard, etc.) arrays, since the target nucleic acid(s) are unknown 
perfect match and mismatch probes cannot be a priori determined, designed, or selected. 
In this instance, the probes are preferably provided as pairs where each pair of probes differ 
in one or more preselected nucleotides. Thus, while it is not known a priori which of the 

25 probes in the pair is the perfect match, it is knovm that when one probe specifically 

hvbndizes to a pariiculai target scqirence, the otV^ei- probe of the pair will act as a mismatch 
coi tiC" *>.r thzt i2rii*^l be-^'ficncc. It vAli be appreciated that the perfect match and m!S"*mch 
probes need not be provided as pairs, but may be provided as larger collecuun:. {e.g., 3. 4, 
S, or iTiorc) cf probes differ from each other in particula/ pieselectcd nucleotides. 

30 While txie iiii5match(s) may b? Incited anywnere in the iiiibinatcli piobc, terminal 

mismatches are less desirable as a terminal mismatch is less likely to prevent hybnd'^rj^V^on 
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of the target sequence. In a particularly preferred embodiment, the mismatch is located al 
or near the center of the probe such that the mismatch is most likely to destabilize the 
duplex with the target sequence under the test hybridization conditions. In a particularly 
preferred embodiment perfect matches differ from mismatch controls in a single 

5 located nucleotide. 

The terms ''background" or "background signal intensity'* refer to 
hybridization signals resulting from non-specific binding, or other interactions, between 
the labeled target nucleic acids and components of the oligonucleotide array {e.g., the 
oligonucleotide probes, control probes, the array substrate, eta). Background signals may 

10 also be produced by intrinsic fluorescence of the array components themselves. A single 
background signal can be calculated for the entire array, or a different background signal 
may be calculated for each region of the array. In a preferred embodiment, background is 
calculated as the average hybridization signal intensity for the lowest 1% to 1 0% of the 
probes in the array, or region of the array. In expression monitoring arrays {Le., where 

15 probes are preselected to hybridize to specific nucleic acids (genes)), a different 

background signal may be calculated for each target nucleic acid. Where a different 
background signal is calculated for each target gene, the background signal is calculated 
for the lowest 1 % to 1 0% of the probes for each gene. Of course, one of skiil m tne art will 
appreciate that where the probes to a particular gene hybridize well and thus appear to be 

20 specifically binding to a target sequence, they should not be used in a background signal 
calculation. Alternatively, background may be calculated as the average hybridization 
signal intensity produced by hybridization to probes that are not complementary to any 
sequence found in the sample {e,g. probes directed to nucleic acids of the opposite sense or 
to genes not foimd in the sample such as bacterial genes where the sample is of mammalian 

25 origin). Background can also be calculated as the average signal intensity produced by 
regions of* the array that lack any probes at all. 

The term "quantiiymg** wiien used in the ct;r:leAt of t:5iiar:t-->'--5 "ucleic acid 
abundances or concentrations (e.^., trax-^cripticn levels ef a ge^e) can refer to absoiuie oi 
to relative quantification. Absolute quantification may he accomplished by inclusion of 

^0 known concentraiiuuv^) of one cr mere target ""rlpic acids (e.g. control nucleic acids sucn 
as BioB or witli kncwT. amounts t>ie rar^Pt nucleic acids theiubelve^) and referencing the 
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hybridization intensity of unknowns with the knoWi target nucleic acids {e.g. through 
generation of a standard curve). Alternatively, relative quantification can be accomplished 
by comparison of hybridization signals between two or more genes, or betw^een two or 
more treatments to quantify the changes in hybridization intensity and, by implication, 
5 transcription level. 

The"percentage of sequence identity*' or ''sequence identity" is determined 
by comparing two optimally aligned sequences or subsequences over a comparison 
window or span, wherein the portion of the polynucleotide sequence in the comparison 
window may optionally comprise additions or deletions {i.e., gaps) as compared to the 

10 reference sequence (which does not comprise additions or deletions) for optimal alignment 
of the two sequences. The percentage is calculated by determining the number of positions 
at which the identical subunit {e.g. nucleic acid base or amino acid residue) occurs in both 
sequences to yield the number of matched positions, dividing the number of matched 
positions by the total number of positions in the window of comparison and multiplying 

15 the result by 100 to yield the percentage of sequence identity. Percentage sequence identity 
when calculated using the programs GAP or BESTFIT (see below) is calculated using 
defr^ult pan weights. 

Methods of alignment of sequences for comparison are well known in the 
art. Optimal alignment of sequences for comparison may be conducted by the local 

20 homology algorithm of Smith and Waterman, Adv. AppL Math. 2: 482 (1981), by the 

homology alignment algorithm of Needleman and Wunsch J. Mol. Biol. 48: 443 (1970), by 
the search for similarity method of Pearson and Lipman, Proc, Natl. Acad Sci. USA 85: 
2444 (1988), by computerized implementations of these algorithms (including, but not 
limited to CLUSTAL in the PC/Gene program by Intelligenetics, Moutain View, 

25 Califomia, GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software 
Package, Gcaaics Computer Group (GCG), 5*75 Scknre r>r , Madison, Wisconsin. USA), 
or bv :ni;::e;^Li*J:i. T- - yiFirticular, methods for alignin^", <?t*(it«rrtces usihg the CLUSTAL 
program are well described by Higgins and Sharp in Gene, 73: 237-244 (1988) and in 
CABIOS5' ^^i-lS^ (1989)V 
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PWFF DESCRIPTION OF THE DRAWINGS 

Fig. 1 shows a schematic of expression monitoring using oligonucleotide 
arrays. Extracted poly (A)* RNA is converted to cDNA, which is then transcribed in the 
presence of labeled ribonucleotide triphosphates. L is either biotin or a dye such as 
5 fluorescein. RNA is fragmented wth heat in the presence of magnesium ions. 

Hybridizations are carried out in a flow ceU that contains the two-dimensional DNA probe 
arrays. Following a brief washing step to remove unhybridized RKA, the arrays are 
scanned using a scanning confocal microscope. Alternatives in wWch cellular mRNA is 
directly labeled without a cDNA intermediate are described in the Examples. Image 

10 analysis software converts the scanned array images into text files in which the observed 
intensities at specific physical locations are associated with particular probe sequences. 

Fig. 2A shows a fluorescent image of a high density array containing over 
16,000 different oligonucleotide probes. The image was obtained following hybridization 
(15 hours at 40°C) of biotin-labeled randomly fragmented sense RNA transcribed from the 

15 murine B cell (TIO) cDNA library, and spiked at the level of 1 :3,000 (50 pM equivalent to 
about 100 copies per cell) with 13 specific RNA targets. The brightness at any location is 
indicative of the amount of labeled RNA hybridized to the particular oligonucleotide 
probe. Fig. 2B shows a small portion of the array (the boxed region of Fig. 2A) containing 
probes for IL-2 and IL-3 RNAs. For comparison, Fig. 2C shows shown the same region of 

20 the array following hybridization with an imspiked TIO RNA samples (TIO cells do not 
express IL-2 and IL-3). The variation in the signal intensity was highly reproducible and 
reflected the sequence dependence of the hybridization efficiencies. The central cross and 
the four comers of the array contain a control sequence that is complementary to a 
biotin-labeled oligonucleotide that was added to the hybridization solution at a constant 

25 concentration (50 pM). The sharpness of the images near the boundaries of the featiu'es 
was limited by the resolution of the reading device (1 1.25 |im) and not by the spatial 
resolution oXihz array synihesis. 'iim \\\ the border regions of each s>Tithc:;is ieaiuit; 
were systematically ignoreu in the quarititative analysis cf the images. 

Fig. 3 provides a !og/icg plot of the hybridi^tion intensi^^y («v«raae of the 

50 Pm-mm interisiry differences for e^ch gene) versus roT^renfrssTinn for 1 1 different RNA 

targets. The hybridization signals were quantitativeiy related lo target concentration. Tne 
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experiments were performed as described in the Examples herein and in Fig. 2. The ten 10 
cxnokine RNAs (plus bioB) were spiked into labeled TIO RNA at levels ranging from 
1 :300.000 to 1 :3,000. The signals continued to increase with increased concentration up to 
frequencies of 1 :300, but the response became sublinear at the high levels due to saturation 
5 of the probe sites. The linear range can be extended to higher concentrations by using 

shorter hybridization times. RNAs from genes expressed in TIO ceiis (IL-10, P-actin and 
GAPDH) were also detected at levels consistent with results obtained by probing cDNA 
libraries. 

Fig. 4 shows cytokine mRNA levels in the miaine 2D6 T helper cell line at 

10 different times following stimulation with PMA and a calcium ionophore. Poly (A)* RNA 
was extracted at 0, 2, 6, and 24 hours following stimulation and converted to double 
stranded cDN A containing an RNA polymerase promoter The cDNA pool was then 
transcribed in the presence of biotin labeled ribonucleotide triphosphates, fragmented, and 
hybridized to the oligonucleotide probe arrays for 2 and 22 hours. The fluorescence 

15 intensities were converted to RNA frequencies by comparison with the signals obtained for 
a bacterial RNA (biotin synthetase) spiked into the samples at known amounts prior to 
v.yKr;r!i7^tjon A simal of 50,000 corresponds to a frequency of approximately 1 : 100,000 
to a frequency of 1 :5,000, and a signal of 100 to a frequency of 1 :50,000. RNAs for IL-2, 
IL-4, IL-6, and IL-12p40 were not detected above the level of approximately 1:200,000 in 

20 these experiments. The error bars reflect the estimated uncertainty (25 percent) in the level 
for a given RNA relative to the level for the same RNA at a different time point. The 
relative uncertainty estimate was based on the results of repeated spiking experiments, and 
on repeated measurements of IL-10, p-actin and GAPDH RNAs in preparations from both 
TIO and 2D6 cells (unstimulated). The uncertainty in the absolute frequencies includes 

25 message-to-message differences in the hybridization efficiency as well as differences in the 
mRNA isoiaiiun, cDNA SjTtthcsis, and ?v>JA synthesis and labeling steps The imcertainty 
:n t nc^ ahnolute r:t^*^'_!trr 'C' r> estimated tc be z factor of three. 

Fig. 5 shows a fluorescence image of an array containing over 63,000 
diffwrcnt cligonuclcc*'^^ rtrnh?^ for 1 1 R nenes. The ima^e was obtained foiiowing 

30 ovciiil^m hybridisation of a !ab?!*"^ mimnp R cell RNA sample, tacn square svnihesis 
region is 50 x 50 ^m and contains 107 to 108 copies of a specific oiigonucieoiide. Tne 
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array was scanned at a resolution of 7.5 in approximately 1 5 minutes. The bright rows 
indicate RNAs present at high levels. Lower level RNAs were unambiguously detected 
based on quantitative evaluation of the hybridization patterns. A total of 21 murine RKAs 
w^ere detected at levels rangmg from approximately i :3G0,G0G to 1 :1CG. The cross in the 
5 center, the checkerboard in the comers, and the ML'R-I region at the top contain probes 
complementary' to a labeled control oiigonucleottdc that was added to all samples. 

Fig. 6 shows an example of a computer system used to execute the software 
of an embodiment of the present invention. 

Fig. 7 shows a system block diagram of a typical computer system used to 
10 execute the software of an embodiment of the present invention. 

Fig. 8 shows the high level flow of a process of monitoring the expression 
of a gene by comparing hybridization intensities of pairs of perfect match and mismatch 
probes. 

Fig. 9 shows the flow of a process of determining if a gene is expressed 

15 utilizing a decision matrix. 

Figs. lOA and lOB show the flow of a process of determining the 
expression of a gene by comparing baseline scan data and experimental scan data. 

Fig. I i shows the flow of a process of increasing the number of probes for 
monitoring the expression of genes after the number of probes has been reduced or pruned. 

20 Figs. 12a and 12b illustrate the probe oligonucleotide/ligation reaction 

system. Fig. 12 generally illustrates the various components of the probe 
oligonucleotide/ligation reaction system. Fig. 12b illustrates discrimination of non- 
perfectly complementary target:oiigonucleotide hybrids using the probe 
oligonucleotide/ligation reaction system. 

25 Figs. 13a, 13b, 13c, and 13d illustrate the various components of 

iigaiion/hybridization reactions and illustrates various ligation strategies. Fig. 13a 
illustrates various compoi!t^"'N the iigatiGruTiybridizai:;o:i ieaulrji: ^.Oiiiz of which 
optional in varioui> euiiiodiiriCrits. Fig. 13b illustrates a iigatiion sL'^teby t^at riiscnn>maies 
mismaiJits at the tcrmmus of the probe oligonucteotide. Fig. 1 3c illustrates a ligation 

50 siia;ce> tlia; discriininater rnism?-'*^^^ tKp tprminns of the sample oligonucleotide. Fig. 
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13d illustrates a method for improving the discrimination at both the probe terminus and 
the sample terminus. 

Figs. 14a, 14b, 14c, and 14d illustrates a ligation discrimination used in 
conjunction with a restriction digest of llie sample nucleic acid. Fig. 1 4a shows the 
5 recognition site and cleavage pattern of Sad (a 6 cutter) and Hsp92 II (4 cutter). Fig. i4b 
illustrates the effect of Sad cleavage on a (target) nucleic acid sample. Fig. i4c illusn*ates 
a 6 Mb genome (Le., E. coli) digested with Sac! and SphI generating - Ikb genomic 
fragments with a 5* C. Fig. I4d illustrates the hybridization/Iigation of these fragments to a 
generic difference screening chip and their subsequent use as probes to hybridize to the 

10 appropriate ncuelic acid (Format I) or the fragments are labeled, hybridized/ligated to the 
oligonucletide aray and directly analyzed (Format II). 

Figs. 15a, 15b, 15c, 15d, and 15e illustrate the analysis of diffemtial diaplay 
DN A fragments on a generic difference screenign array. Fig. 1 5a shows first strand cDNA 
synthesis by reverse transcripton of poly(a) mRNA using an anchored poly(T) primer. Fig. 

15 1 5b illustrates upstream primers for PGR reaction containing an engineered restrictionsite 
and degenerate bases (N=A,G,C,T) at the 3* end. Fig. 15c shows randomly primed PGR of 
first strand cDNA. Fig. 15d shows restrictiondigest of PGR products, and Fig. 15e shows 
sorting of PGR products on a generic gligationarray by their 5'end. 

Figs. 16a, 16b, and 16c illustrate the differences between replicate 1 and 

20 replicate 2 for sample 1 and sample 2 nucleic acids. Fig. 16a shows the differences 

between replicate 1 and replicate 2 for sample 1, the normal cell line. Fig. 16b shows the 
differences between replicate 1 and replicate 2 for sample 2, the tumor cell line). Figure 
16c plots the differences between sample 1 and 2 averaged over the two replicates. 

Figs. 17a, 17b, and 17c illustrates the data of Figs 16A, 16b, and 16c 

25 filtered. Figure 1 7a shows the relative change in hybridization intensities of replicate 1 and 
2 cf sample 1 for the difYe^^e^ce of each oligonucleotide pair. Fig. 17b shows the ratio of 
rcp!:cate 1 and 2 of sampje 2 ftn the difference ol caci"; cliscnciectide pair, iionxv^u^C\, 
filtered, and plotted the same way as m Figure i7A. Fig. i7c shows Hie ratio of saniple 1 
?inrt <;ample 2 averaged over two replicates for the diffeience of each oIigonuclcct;de pair. 

30 The ratio cf^lcMlated as m hie. i7A, bui bai>eu ou llie alraulutc value of 
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[(X2:u^>^22L2V2]/[(X„„+X,2i^V2] and [PCnu-^X:2uV2]/f(X:;u^X,2,:V2] afternomiaUzauon 
as in Fig. 16c. 

Fig. 1 8 illustrates post-fragmentation labeling using a CIAF treatment. 

Fig. 19 provides a schematic iiliistrsiior of nos-hybridization end labclin;^ 
5 on a high densit>' oligonucleotide array. 

Fig. 20 provides a schematic illustration end-labeling utilizing pre-reaction 
of a high density array prior to hybridization and end labeling. 

Fig. 21 illustrates the results of a measure of post-hybridization TdTase end 
labeling call accuracy. 

10 Fig. 22 illustrates oligo dT labeling on a high density oligonucleotide array. 

Fig. 23 illustrates various labeling reagents suitable for use in the methods 
disclosed herein. Fig. 23a shows various labeling reagents. Fig. 23b shows still other 
labeling reagents. Fig. 23c shows non-ribose or non-2'-deoxyribose-containing labels. Fig. 
23d shows sugar-modified nucleotide analogue labels 23d. 

15 Fig. 24. illustrates resequencing of a target DNA molecule with a set of 

generic n-mer tiling probes. 

Fig. 25 illustrates four tiling arrrays present on a 4-mer generic array. 

Fig. 26 illustrates base calling at the 8th position in the target. 

Fig. 27 illustrates a base vote table. 
20 Fig. 28 illustrates the effect of applying correctness score transform to HIV 

data. 

Fig. 29 illustrates mutation detection by intensity comparisons. 
Fig. 30 illustrates bubble formation detection of mutation in the HIV 

genome. 

25 Fig. 31 illustrates induced difference nearest neighbor probe scoring. 

Fig. 32 illustrates mutations found in an HIV PGR target (B) using a generic 
Vxu^tHui GeneChiD^" iind induced diiieieiH ^. ^iriaiysis. 

Fig. 33 iiiusaales rrmtaiiGn detection using ccmp3rison5 between a 
reference target arid a sample target. 
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DE T AILED P F S C RIPT IQN 
/. Expression Monitoring and Generic Difference Screening. 

This invention provides methods of expression monitoring and generic 
difference screening. The teiin expression monitoring is used to refer to the determination 
5 of levels of expression of particulai; typically preselected, genes. In a preferred 

embodiment, the expression monitoring methods of this invention utilize high densit}- 
arrays of oligonucleotides selected to be complementary to predetermined subsequences of 
the gene or genes whose expression levels are to be detected. Nucleic acid samples are 
hybridized to the arrays and the resulting hybridization signal provides an indication of the 

10 level of expression of each gene of interest. Because of the high degree of probe 

redundancy (typically there are multiple probes per gene) the expression monitoring 
methods provide an essentially accurate absolute measurement and do not require 
comparison to a reference nucleic acid. 

In another embodiment, this invention provides generic difference screening 

15 methods, that identify differences in the abundance (concentration) of particular nucleic 
acids in two or more nucleic acid samples. The generic difference screening methods 
mvoive hjbriui^i*£, tvvc cr r:2re '^vr^e'c pc^d <i?imples to the same array high density 
oligonucleotide array, or to different high density oligonucleotide arrays having the same 
oligonucleotide probe composition, and optionally the same oligonucleotide spatial 

20 distribution. The resulting hybridizations are then compared allowing determination which 
nucleic acids differ in abundance (concentration) between the two or more samples. 

Where the concentrations of the nucleic acids comprising the samples 
reflects transcription levels genes in a sample from which the nucleic acids are derived, the 
generic difference screening methods pennh identification of differences in transcription 

25 (and by implication in expression) of the nucleic acids comprising the two or more 

.san^.sles. The differentiaUy {e p:.: over- or uuuci) txpicsscd nucleic acids thus identified 
can be ui^^C [e.g.. a3 prcbc:.) to rirrcrmint: arid/'-: l-ola^c thcnc genes whose exnr^^i^Uu* 
levels differs between the two or more samples. 

Tne gciit;iiw vliffcrcr.ee !:cre?".'>2 rr^^t^iads are advantageous in that, in 

30 contrast to the expTession moiiiiuiing methods, they require no a prion assumptions aboui 
the probe oligonucleotide composition of the array. To the conu-ary, the sequences of the 
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probe oligonucleotides may be random, haphazard, or any arbitrary subset of 
oligonucleotide probes. \Miere the oligonucleotide probes are short enough {e.g., less than 
or equal to a 12 mer) the array may contain every possible nucleic acid of that length. 
Despite the fact that the generic difference screcninji arrays might be arbitrary or random, 
5 since the sequence of each probe in the array is known the generic difference screening 
methods stili provide direct sequence information regarding the differentially expressed 
nucleic acids in the samples. 

The expression monitoring and generic difference screening methods of this 
invention involve providing an array containing a large number (e.g. greater than 1 ,000) of 

10 arbitrarily selected different oligonucleotide probes (probe oligonucleotides) where the 

sequence and location in the array of each different probe is known. Nucleic acid samples 
{e.g. mRNA) are hybridized to the probe arrays and the pattern of hybridization is detected. 

It is demonstrated herein and in copending applications U. S Patent Serial 
No. 08/529,1 15 filed on September 15, 1995 and PCT/US96/14839 that hybridization with 

15 high density oligonucleotide probe arrays provides an effective means of detecting and/or 
quantifying the expression of particular nucleic acids in complex nucleic acid populations. 
The expression monitoring and difference screening methods of this invention may be used 
in a wide variety of circumstaBces including detection of disease, identification of 
differential gene expression between two samples (e.g., a pathological as compared to a 

20 healthy sample), screening for compositions that upregulate or downregulate the 
expression of particular genes, and so forth. 

In one preferred embodiment, the methods of this invention are used to 
monitor the expression (transcription) levels of nucleic acids whose expression is altered in 
a disease state. For example, a cancer may be characterized by the overexpression of a 

25 particular marker such as the HER2 (c-erbB-2/neu) proto-oncogene in the case of breast 
cancel. Similarly, overexpression of receptor tyrosine kinases (RTKs) is associated with 
the stioicgy of a nmiibt' tuiuors inciuGing carcincmut; of liit; bie^ii^t, , b-^idcr, 
pancreas, <x> well as glioblastomas, sarcomas and squamous carc^r^nrn^^i tcp^ C ^menrer 
Arm. Rev. Biochcin., 56: 881-914 (1987)). Co"v>;rseiy. a cancer (e.g.. coIerectaL lung and 

30 Li cast) may be characterized by miit^rion of or underexpression of a tumor suppressor 
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gene such as P53 (see, e.g., Tominaga et al Critical Rev. in Oncogenesis, 3: 257-282 
(1992)). 

V^Tiere the particular genes of interest are known, the high densit\' arrays 
will preferably contain probe oHgonucleatidcs selected to be complcmeniar)' to the 
5 sequences or subsequences of those genes of interest. High probe redundancy for each 
gene of interest can he achieved and absoiute expression levels of each gene can be 
determined. 

Conversely, where it is unknown which genes differ in expression between 
the healthy and disease state the generic difference screening methods of this invention are 

10 particularly appropriate. Hybridization of the healthy and pathological nucleic acids to the 
generic difference screening arrays disclosed herein and comparison of the hybridization 
patterns identifies those genes whose regulation is altered in the pathological state. 

Similarly, the expression monitoring and generic difference screening 
methods of this invention can be used to monitor expression of various genes in response 

15 to defined stimuli, such as a drug, cell activation, etc. The methods are particularly 

advantageous because they permit simultaneous monitoring of the expression of large 
numbers of genes. This is especially useful in drug research if the end point description is 
a complex one, not simply asking if one particular gene is overexpressed or 
underexpressed. Thus, where a disease staje or the mode of action of a drug is not well 

20 characterized, the methods of this invention allow rapid determination of the particularly 
relevant genes. Again, where the gene of interest is known or suspected, expression 
monitoring methods will preferably be used, while generic screening methods will be used 
when the particular genes of interest are unknown. 

Using the generic difference screening methods disclosed herein, lack of 

25 knowledge regarding the particular genes does not prevent identification of useful 

iheiapcutics. For example, if the hyhndization pattern on a particular hi^h density array 
Toi heav-y cr,J5 i.*; IcncvvT, and sip,n'^icr*^i?ly different trom Xim pattern for a diseased lt'I, 
then libraries of compounds can be screened tor tnose thai cause the paiitm fui a diseased 
n^w to hf^rome like that for the healthy ceii. This provides a very detailed measure of the 

30 cellular respor*^** *n drug. 
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Generic difference screening methods thus provide a powerful tool for gene 
discover)' and for elucidating mechanisms underlying complex cellular responses to 
various stimuli. For example, in one embodiment, generic difference screening can be 
used for "expression fingerprinting''. Suppose it is found that the mRNA from a certain 
5 eel! t\'pe displays a distmct overall hybridization pattern that is different under difTerent 
conditions (e.g. when harboring mutations in particular genes, in a disease state). Then 
this pattern of expression (an expression fingerprint), if reproducible and clearly 
differentiable in the different cases can be used as a very detailed diagnostic. It is not even 
required that the pattern be fully interpretable, but just that it is specific for a particular cell 

10 state (and preferably of diagnostic and/or prognostic relevance). 

Both expression monitoring methods and generic difference screening may 
also be used in drug safety studies. For example, if one is making a new antibiotic, then it 
should not significantly affect the expression profile for mammalian cells. The 
hybridization pattern could be used as a detailed measure of the effect of a drug on cells. 

15 In other words, as a toxicological screen. 

The expression monitoring and generic difference screening methods of this 
invention are particularly well suited for gene discovery. For example, as explained above, 
the generic difference screening methods identify differences m aDunoances of nucleic 
acids in two or more samples. These differences may indicate changes in the expression 

20 levels of previously unknovm genes. The sequence information provided by a difference 
screening array can be utilized, as described herein, to identify the unknown gene. 

The expression monitoring methods can be used in gene discovery by 
exploiting the fact that many genes that have been discovered to date have been classified 
into families based on commonality of the sequences. Because of the extremely large 

25 number of probes it is possible to place in the high density array, it is possible to include 
olieonucieotide piobes representing knovTO or parts of known members from every gene 
class, in utilizing such ^ 'Viilrr (high dcnsiiy ui;*iy; ^e^ics th?- ^-r already knovv:^. \vo!i!d 
give a positive .sigiial at loci containing both varab^e common regions. Foi uiikiiOvvu 
genes, only the common regions of th? ^^^e family would give a positive signal. The 

30 le^ult wGuId indicate the poss^^^i'^^* of a newly discovered sene. 
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The expression monitoring and generic difference screening naethods of this 
invention thus also allow the development of '^dynamic'' gene databases. The Human 
Genome Project and commercial sequencing projects have generated large static databases 
which list thousands of sequences v^ithout regard to function or genetic interaction. 
5 Analyses using the methods of this invention produces ''dynamic'' databases that define a 
gene's function and its intcracuons uith other genes. Without the ability to monitor tlie 
expression of large numbers of genes simultaneously, or the abilito to detect differences in 
abundances of large numbers of "unknown" nucleic acids simultaneously, the work of 
creating such a database is enormous. 

10 The tedious nature of using DNA sequence analysis for determining an 

expression pattern involves preparing a cDNA library from the RNA isolated from the 
cells of interest and then sequencing the library. As the DNA is sequenced, the operator 
lists the sequences that are obtained and counts them. Thousands of sequences would have 
to be determined and then the frequency of those gene sequences would define the 

15 expression pattern of genes for the cells being studied. 

By contrast, using an expression monitoring, or generic difference 
5?creenine. array to obtain the data according to the methods of this invention is relatively 
fast and easy. For example to in one embodiment, cells may be stimulated to induce 
expression. The RNA is obtained from the cells and then either labeled directly or a cDNA 

20 copy is created. Fluorescent molecules may be incorporated during the DNA 

polymerization. Either the labeled RNA or the labeled cDNA is then hybridized to a high 
density array in one overnight experiment. The hybridization provides a quantitative 
assessment of the levels of every single one of the hybridized nucleic acids wdth no 
additional sequencing. In addition the methods of this invention are much more sensitive 

25 allowing a few copies of expressed genes per cell to be detected. This procedure is 
deiiioiii>uated in the examples provided herein. These uses of t>^e tnethorts of this 

Ikii.i^wi^d* i.! - *i!»*ct*-oTWr<» ot-kri in nn manner limitino 

//. High Density Array"^ f^or Generic Difference Screening and 
30 Expression Monitoring. 
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As indicated above, this invention provides methods of monitoring 
(detecting and/or quantifying) the expression leveis of a large number of nucleic acids 
and or determining differences in nucleic acid concentrations (abundances) between two or 
more samples. The methods involve bxhndization of one or more a nucleic acid sample^ 

5 (target nucleic acids) to one or miore high density arrays of nucleic acid probes and then 
quantifying the amount of target nucleic acids hybridized to each probe m the array. 

While nucleic acid hybridization has been used for some time to determine 
the expression levels of various genes (e.g.. Northern Blot), it was a surprising discovery of 
this invention that high density' arrays are suitable for the quantification of the small 

10 variations in abimdance (e,g„ transcription and, by implication, expression) of a nucleic 
acid (e.g.. gene) in the presence of a large population of heterogenous nucleic acids. The 
signal (e.g., particular gene or gene product, or differentially abundant nucleic acid) may be 
present at a concentration of less than about 1 in 1,000, and is often present at a 
concentration less than 1 in 10,000 more preferably less than about 1 in 50,000 and most 

15 preferably less than about 1 in 100,000, 1 in 300,000, or even 1 in 1,000,000. 

The oligonucleotide arrays can have oligonucleotides as short as 1 0 
nucleotides, more preferably 15 oligonucleotides and most preferably 20 or 25 
oligonucleotides are used to specifjcally detect and quantity nucieic acid expression levels. 
Where ligation discrimination methods are used, the oligonculeotide arrays can contain 

20 shorter oligonucleotides, in this instance, oligonucleotide arrays comprising 

oligonucleotides ranging in length from 6 to 15 nucleotides, more preferably from about 8 
to about 12 nucleotides in length are preferred. Of course arrays containing longer 
oligonucleotides, as described herein, are also suitable. 

The expression monitoring arrays, which are designed to detect particular 

25 preselected genes, provide for simultaneous monitoring of at least about 10, preferably at 
least about 100, more preferably at least about 1 000, still more preferably at least about 
lO^CUv, and mosi pir^r^'mMV at ica^i ubuul lv;'J,GOO diffcrrnt 

A) Advantages of Oligonudeoiide Arrays, 
5Q T« prefp'^T^:*H pTTihodiment, the hich density arrays used m tiie metnoas of 

this invention cor"p"se chemicaiiy svnihesi^eu oiigonucleotsdcs. The use of chemically 
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synthesized oligonucleotide arrays, as opposed to, for example, blotted arrays of genomic 
clones, restriction fragments, oligonucleotides, and the like, offers numerous advantages. 
These advantages generally fall into four categories: 

1 ) Efilciency of production; 

2) Reduced intra- and inter-array variabilit}-; 

3) Increased infonnailon content; and 

4) Improved signal to noise ratio. 



1) Efficiency of production, 

10 In a preferred embodiment, the arrays are synthesized using methods of 

spatially addressed parallel synthesis {see, e.g., Section V, below). The oligonucleotides 
are synthesized chemically in a highly parallel fashion covalently attached to the array 
surface. This allows extremely efficient array production. For example, arrays containing 
any collection of tens (or even hundreds) of thousands of specifically selected 20 mer 

15 oligonucleotides are synthesized in fewer than 80 synthesis cycles. The arrays are designed 
and synthesized based on sequence information alone. Thus, unlike blotting methods, the 
array preparation requires no handling of biological materials. There is no need for cloning 
steps, nucleic acid purifications or amplifications, cataloging of clones or amplification 
products, and the like. The preferred chemical synthesis of high density oligonucleotide 

20 arrays in this invention is thus more efficient than blotting methods and permits the 
production of highly reproducible high-density arrays. 

2) Reduced intra- and inter-array variability. 

The use of chemically synthesized high-density oligonucleotide arrays in the 
25 methods of this invention improves intra- and inter-array variability. The oligonucleotide 

aiiays preferred for this invep^'on are made in large batches (presently 49 arrays per wafer * 

wn- ^rMlt;cie wafers syn*>^*iiy*"-tl lit uoiailel ) m a liighiy controiied repiUi^Mcibic manner. 

This makes them suitable as general diagnostic and research tools pciiriitting direct 

comnf^nsnns of assavs performed at tifferent times and locations. 
30 Because ot the precise coniiol ubuiniable during the chemicd synt^^^^« r^? 

arrays of this invention show le^s uioii abou; 25%, preferably less than about 20%, more 
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preferably less than about 15%, still more preferably less than about lO^'o, even more 
preferably less than about 5%, and most preferably less than about 2% variation betv^een 
high density arrays (within or between production batches) having the same probe 
composition. .Ajxay variation is assayed as the variation in hybridization intensity- (agsansi 
a labeled control target nucleic acid mixture) in one or more oligonucieotide probes 
between two or more arrays. More preferably, array variation is assayed as the variation in 
hybridization intensity (against a labeled control target nucleic acid mixture) measured for 
one or more target genes tetween two or more arrays. 



synthesized arrays also reduce variations in relative probe frequency inherent in spotting 
methods, particularly spotting methods that use cell-derived nucleic acids (e.g., cDNAs). 
Many genes are expressed at the level of thousands of copies per cell, while others are 
expressed at only a single copy per cell. A cDNA library will reflect this very large bias as 
will a cDNA library made from this material. While normalization (adjustment of the 
amoimt of each different probe e.g., by comparison to a reference cDNA) of the library v^U 
reduce the representation of over-expressed sequences to some extent, normalization has 
been shown to lessen the odds of selecting highly expressed cDNAs by only about a factor 
of 2 or 3. In contrast, chemical synthesis methods can insure tnat aii oligonucleotide 
probes are represented in approximately equal concentrations. This decreases the inter- 
gene (intra-array) variability and permits direct comparison between bbybridizaiion signals 
for different oligonoucleotide probes. 

3) Increased information content 

i) Advantages for expression monitoring. 
The use of high density oligonucleotide arrays for expression monitoring 
provides a number of advantages not found with other methods. For example, the use of 

;<4v^-t^ uumbcrs of diHereiii piobcs tb"* sprr^f aaUy hlad to the transcriptioTi product of a 
particular target gene provides a high degree of reuundaiiuy diiu uitcinai conLroi that 
permits optimization of probe sets for effective detection of particular target genes and 
minimizes the possibility of errors due to cross-reactivity witn other nucieic acid species. 



In addition to reducing inter- and intra-array variability, chemically 
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Apparently suitable probes often prove ineffective for expression 
monitoring by hybridization. For example, certain subsequences of a particular target gene 
may be found in other regions of the genome and probes directed to these subsequences 
will cross-hybridize with the other regions and not provide a signal that is a meaningf.si 
5 measure of the expression level of the target gene. Even probes that show little cross 
reactivity may be unsuitable because they generally show poor hybridization due to the 
formation of structures that prevent effective hybridization. Finally, in sets with large 
numbers of probes, it is difficult to identify hybridization conditions that are optimal for all 
the probes in a set. Because of the high degree of redundancy provided by the large 

10 number of probes for each target gene, it is possible to eliminate those probes that function 
poorly under a given set of hybridization conditions and still retain enough probes to a 
particular target gene to provide an extremely sensitive and reliable measure of the 
expression level (transcription level) of that gene. 

In addition, the use of large numbers of different probes to each target gene 

15 makes it possible to monitor expression of families of closely-related nucleic acids. The 
probes may be selected to hybridize both with subsequences that are conserved across the 
family and with subsequences that differ in the different nucleic acids in the family. Thus, 
hybridization with such arrays pennits simultaneous monitoring of the various members of 
a gene family even where the various genes are approximately the same size and have high 

20 levels of homology. Such measurements are difficult or impossible with traditional 
hybridization methods. 

ii) General advantages. 

Because the high density arrays contain such a large number of probes it is 
25 possible to provide numerous controls including, for example, controls for variations or 

controls for overall hybridization conditions, controls for 
:;arr.nlc crecaraiion GOPdi«HH*N- controls lor metabolic activity oi ce;; from which thw 
nucleic acids are derived and mismatch controls for uou-spccsfic binding cr cress 
hybridis^ation. 

30 fcttective detection atiu quoimtation cf gene tianscnption ?n cnmnlf^Y 

mammalian ceil message populations can be determined with relatively short 
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oligonucleotides and with relative few {e.g., fewer than 40, preferably fewer than 30, more 
preferably fewer than 25, and most preferably fewer than 20, 15, or even 10) 
oligonucleotide probes per gene. There are a large number of probes which hybridise both 
strongly and specifically for each gene. This does not mean that a large numbi:r of probes 
5 is required for detection, but rather that there are many from which to choose and that 

choices can be based on other considerations such as sequence uniqueness (gene families), 
checking for splice variants, or genotyping hot spots (things not easily done with cDNA 
spotting methods). 

in use, sets of four arrays for expression monitoring are made that contain 
10 approximately 400,000 probes each. Sets of about 40 probes (20 probe pairs) are chosen 
that are complementary to each of about 40,000 genes for which there are ESTs in the 
public database. This set of ESTs covers roughly one-third to one-half of all human genes 
and these arrays will allow the levels of all of them to be monitored in a parallel set of 
overnight hybridizations. 

15 

4) Improved signal to noise ratio. 

Blotted nucleic acids sometimes rely on ionic, electrostatic, and 
hydrophobic interactions to attach the b'oUed nucleic acids to the substrate. Bonds are 
fomied at multiple points along the nucleic acid restricting degrees of freedom and 

20 interfering with the ability of the nucleic acid to hybridize to its complementary target. In 
contrast, the preferred arrays of this invention are chemically synthesized. The 
oligonucleotide probes are attached to the substrate by a single terminal covalent bond. 
The probes have more degrees of freedom and are capable of participating in complex 
interactions with their complementary targets. Consequently, the probe arrays of this 

25 invention show significantly higher hybridization efficiencies (10 times, 1 00 times, and 

even 1000 times more efficient) than blotted arrays. Less target oligonucleotide is used to 
oroiluce a piven signa! •iir!?"i»v Mrarnaiicaiiy imprcvirig the lii^riai io iioir^e r^tlo. 
Consequently the incuiuds of this invention permit detection of only tew conies of a 
nucleic acid in extremely complex nucleic acid mixtures 



B) Preferred H'gh Density Arrays 
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Preferred high density arrays of this invention comprise greater than about 
100, preferably greater than about 1000, more preferably greater than about 16,000 and 
most preferably greater than about 65,000 or 250,000 or even greater than about 1,000,000 
different oligonucleotide probes. The oligonucleotide probes range from about 5 to about 
5 50 or about 5 to about 45 nucleotides, more preferably from about 10 to about 40 
nucleotides and most preferably from about 1 5 to about 40 nucleotides in length. In 
particular preferred embodiments, the oligonucleotide probes are 20 or 25 nucleotides in 
length, while in other preferred embodiments (particularly where ligation discrimination 
reactions are used) the oligonucleotide probes are preferably shorter {e.g., 6 to 20 more 

10 preferably 8 to 1 5 nucleotides in length). It was a discovery of this invention that relatively 
short oligonucleotide probes sufficient to specifically hybridize to and distinguish target 
sequences. Thus in one preferred embodiment, the oligonucleotide probes are less than 50 
nucleotides in length, generally less than 46 nucleotides, more generally less than 41 
nucleotides, most generally less than 36 nucleotides, preferably less than 3.1 nucleotides, 

15 more preferably less than 26 nucleotides, and most preferably less than 21 nucleotides in 
length. The probes can also be less than 16 nucleotides, less than 13 nucleotides in length, 
less than 9 nucleotides in length and less than 7 nucleotides in length. It is also recognized 
that the oligonucleotide probes can be relatively long, ranging in length up to about 1000 
nucleotides, more typically up to about 500 nucleotides in length. 

20 The location and, in some embodiments, sequence of each different 

oligonucleotide probe in the array is known. Moreover, the large number of different 
probes occupies a relatively small area providing a high density array having a probe 
density of generally greater than about 60, more generally greater than about 100, most 
generally greater than about 600, often greater than about 1000, more often greater than 

25 about 5,000, most often greater than about 1 0,000, preferably greater than about 40,000 
more preferably gi eater than ::bcut 100,000 f^nd most preferably greater than about 
400 000 difleieai o«'P>OMUclcct:d£ probes pf^ c^t/. The small surface area of the arryy 
(often less than about 10 cm^ preferably less than about 5 cm'' more piTcfciably less than 
about 2 crr-^ ^^^t preferably less than about 1 .6 cm^) permits the use of small sample 

30 volunies and extremely uniform hybnaization conuiiiona (lernperaturc rcgulztzon, z^h 
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contem, etc) while the extremely large number of probes allows massively parallel 
processing of hybridizations. 

Finally, because of the small area occupied by the hJgh density arrays, 
hybridization may be carried out in extremt^iy small fiuid volumes (c,^., \x\ or Jess. 
5 more preferably 100 fil or less, and most preferably 10 ^1 or less). In addition, 
hybndi^tion conditions are extremely uniform throughout the sample, and the 
hybridization format is amenable to automated processing. 

///. Monitoring Gene Expression and Generic Difference Screening. 

10 As explained above, this invention provides methods for monitoring gene 

expression (expression monitoring) and for identifying differences in abundance 
(concentration) of nucleic acids in two or more nucleic acid samples (generic difference 
screening). Generally the methods of monitoring gene expression of this invention involve 
(1) providing a pool of target nucleic acids comprising RNA transcript(s) of one or more 

15 target gene(s), or nucleic acids derived from the RNA transcript(s); (2) hybridizing the 
nucleic acid sample to a high density array of probes (including control probes); and (3) 
detecting the hybridized nucleic acids and calculating a relative expression (transcription) 
level. These methods preferably involve the use of high density oligonucleotide arrays 
containing probes to specifically preselected genes. 

20 In contrast, the arrays used in the generic difference screening methods of 

this invention do not require that specific target genes be identified. To the contrary, the 
methods are designed to detect changes or differences in expression of various genes where 
the particular gene to be identified is unknown prior to performing the difference 
screening. 

25 The methods of generic difference screening typically involve the steps of: 

1) providing ore or more high density oligonucleotide arrays (preferably including probes 
cairs differinj^ in one or more nucleotides); 2) providing iwo or iuorc nucicic ac;d sampitb, 
3) hybridizing the nucleic acid samples to one or more arrays to form hybrid duplexes 
between nucleic acids in the nucleic aeid samples and prcbe oligonucleotides in the 

30 ^rmyi%)\ 3) detecting the hybriuizaiiGn of the nucleic acids to ^rr^y^- 4) 
determining the differences in hybridization between thp nucleic acid samples. 
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The provision of a nucleic acid sample, the hybridization of the sample to 
the arrays, and detection of the hybridized nucleic acid(s) is performed in essentially the 
same manner in expression monitoring and in generic difference screening methods. As 
disclosed herein, in preferred embodiments, the methods a^e distinguished, in part, by 
5 oligonucleotide probe selection, in the use of at least tvvo nucleic acid samples in generic 
difference screening, and in subsequent analysis. 

A) Providing a Nucleic Acid Sample, 

In order to measure the nucleic acid concentration in a sample, it is 

10 desirable to provide a nucleic acid sample for such analysis. Where it is desired that the 
nucleic acid concentration, or differences in nucleic acid concentration between different 
samples, reflect transcription levels or differences in transcription levels of a gene or genes, 
it is desirable to provide a nucleic acid sample comprising mRNA transcript(s) of the gene 
or genes, or nucleic acids derived from the mRNA transcript(s). As used herein, a nucleic 

15 acid derived from an mRNA transcript refers to a nucleic acid for whose synthesis the 
mRNA transcript or a subsequence thereof has ultimately served as a template. Thus, a 
cDN A. reverse transcribed from an mRNA, an RNA transcribed from that cDNA, a DNA 
amplified from the cDNA, an RNA transcribed from the amplified DNA, etc., are all 
derived from the mRNA transcript and detection of such derived products is indicative of 

20 the presence and/or abundance of the original transcript in a sample. Thus, suitable 

samples include, but are not limited to, mRNA transcripts of the gene or genes, cDNA 
reverse transcribed from the mRNA, cRNA transcribed from the cDNA, DNA amplified 
from the genes, RNA transcribed from amplified DNA, and die like. 

In a particularly preferred embodiment, where it is desired to quantify the 

25 transcription level (and thereby expression) of a one or more genes in a sample, the nucleic 
acid sample is uiie in which the concentt^t^on of the mRNA transcript(s) of the gene or 
rr-n-n- or the 'j*Jiit€vtr«t;cn cf the nuc'dr ^^trnSs derived from L'le mRNA transcnp?(>); 
proportional to the transcription level (and therefore expression ievei) of tliai gcue. 
S;n;;lar!y, it it prefe"'^*d ihat the hvbridization signal intensity be proportional to the 

50 ajiiuuiit of hybridized nuc'e>n ^cid. wnue it is preferred iiidt tlic; proportionalit}' be 
relatively strict (e.g., a doublmg in transcripiion rate results in a doubling in mRNA 



BNSDOCID <W0 G727317AV.I. > 



wo 97/27317 




PCTAJS97/01603 



transcript in the sample nucleic acid pool and a doubling in hybridization signal), one of 
skill will appreciate that the proportionality can be more relaxed and even non-linear. 
Thus, for example, an assay where a 5 fold difference in concentration of the target mRNA 
results in a 3 to 6 fold difference in hybridization intensit>' is suff:c.;ent for mosx purposes. 
5 Where more precise quantification is required appropriate controls can be run to coneci for 
variations introduced in sample preparation and hybridization as described herein. In 
addition, serial dilutions of "standard" target mRNAs can be used to prepare calibration 
curves according to methods well known to those of skill in the art. Of course, where 
simple detection of the presence or absence of a transcript or large differences of changes 

10 in nucleic acid concentration is desired, no elaborate control or calibration is required. 

In the simplest embodiment, such a nucleic acid sample is the total mRNA 
or a total cDNA isolated and/or otherwise derived from a biological sample. The term 
"biological sample", as used herein, refers to a sample obtained from an organism or from 
components {e.g., cells) of an organism. The sample may be of any biological tissue or 

15 fluid. Frequently the sample will be a "clinical sample" which is a sample derived from a 
patient. Such samples include, but are not limited to, sputum, blood, blood cells (e.g., 
white cells), tissue or fine needle biopsy samples, urine, peritoneal fluid, and pleural fluid, 
or cells therefrom. Bioiogica? samples may also include sections of tissues such as frozen 
sections taken for histological purposes. 

20 The nucleic acid (either genomic DNA or mRNA) may be isolated from the 

sample according to any of a number of methods well known to those of skill in the art. 
One of skill will appreciate that where alterations in the copy number of a gene are to be 
detected genomic DNA is preferably isolated. Conversely, where expression levels of a 
gene or genes are to be detected, preferably RNA (mRNA) is isolated. 

25 Methods of isolating total mRNA are well known to those of skill in the art. 

For example, methods of isolation and purification of nucleic acids are described in detail 
in Chapter 3 oi LmHnaiory Techniqua:; it: Bivi^he-nistry (^^d h^nkculur Blolo^j: 
HybridizQiiori V/ith Nucleic Acid Probes, Parf } J heory and Nucleic Acid rrtpuruiiOn, v. 
Tijsscn, cd. Elsevier, N.Y. (1991) and Chapter 3 oiLabcratory Techniques in 

30 Biochcmisrry and Molen^^^r fliolopy: Hybridization With Nucleic Acid trobes. i^art L 
Theory and Nucleic Acid Preparation, P. Tijiiscu, ed. Elsevier, N.Y. (1993)). 
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In a preferred embodiment, the total nucleic acid is isolated from a given 
sample using, for example, an acid guanidiniimi-phenol-chloroform extraction method and 
polyA" mRNA is isolated by oligo dT column chromatography or by using (dT)n magnetic 
beads {see, e.g.. Sambrook ei a/.. Molecular Cloning: A Laboratory Manual {2nd ed.), 
5 Vols, lo. Cold Spring Harbor Labcrator\', (1 989), or Current Proiocols m Molecular 
Biology, F- Ausiibel et ai , ed. Greene Publishing aiid Wiley-Intcrscicncc, New York 
(1987)). 

Frequently, it is desirable to amplify the nucleic acid sample prior to 
hybridization. One of skill in the art will appreciate that whatever amplification method is 

10 used, if a quantitative result is desired, care must be taken to use a method that maintains 
or controls for the relative frequencies of the amplified nucleic acids. 

Methods of "quantitative" amplification are well known to those of skill in 
the art. For example, quantitative PCR involves simultaneously co-amplifying a known 
quantity of a control sequence using the same primers. This provides an internal standard 

15 that may be used to calibrate the PCR reaction. The high density array may then include 
probes specific to the internal standard for quantification of the amplified nucleic acid. 

One preferred internal standard is a synthetic AWl 06 cRNA. The AWl 06 
cRNA is combined with RNA isolated from the sample according to standard techniques 
known to those of skill in the art. The RNA is then reverse transcribed using a reverse 

20 transcriptase to provide copy DNA. The cDNA sequences are then amplified (e.g., by 
PCR) using labeled primers. The amplification products are separated, typically by 
electrophoresis, and the amount of radioactivity (proportional to the amount of amplified 
product) is determined. The amount of mRNA in the sample is then calculated by 
comparison with the signal produced by the known AWl 06 RNA standard. Detailed 

25 protocols for quantitative PCR are provided in PCR Proiocols, A Guide to Methods and 
Apf^lii^utions, In.-i3 ct al, Ac-deinic Pr^v*^, inc. NY, (1990). 

Other f;u:t2b!e 2in?!^f»r«*H>r» ntt^inous iricludc. but arc not limited to 
polymerase chain reaction (PCR) (Innis, et ai, PCR Protocols. A guide to Methods and 
App^'^^*^^^ Academic Press, Inc. San Uiego, (i 990)), iigase chain redctiua (LCR) (see 

30 \Vu and Walhre, npnnmics, 4: !)5U (1^89), Landegren, ei ai, Scitn^i,, 241 . 1077 (1 9SS) 
and Barringer, et at., Uene^ 89: 1 i 7 (1990), iraiisciipiiun aiTiplificaticn (Kwoh, et al., Proc. 
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NaiL Acad ScL USA, 86: 1 173 (1989)), and self-sustained sequence replication (Guatelli, 
et ai, Proc, NaL Acad Sci. USA, 87: 1874 (1990)). 

In a panicularly preferred embodiment, the sample mRKA is reverse 
transcribed with a reverse transcriptase and a primer consisting of oiigc dT and a sequence 

5 encoding the phage T7 promoter to provide single stranded DNA template. The second 
DNA strand is polymerized using a UNA polymerase. After synthesis of double-stranded 
cDNA, T7 RMA polymerase is added and RNA is transcribed from the cDNA template. 
Successive rounds of transcription from each single cDNA template results in amplified 
RNA. Methods of in vitro polymerization are well known to those of skill in the art (see, 

10 e.g., Sambrook, supra) and this particular method is described in detail by Van Gelder, e! 
ai, Proc. Natl Acad Sci. USA, 87: 1663-1667 (1990) who demonstrate that in vitro 
amplification according to this method preserves the relative frequencies of the various 
RNA transcripts. Moreover, Ebenvine et al Proc. Natl Acad. ScL USA, 89: 3010-3014 
provide a protocol that uses two rounds of amplification via in vitro transcription to 

15 achieve greater than 10* fold amplification of the original starting material thereby 
permitting expression monitoring even where biological samples are limited. 

It will be appreciated by one of skill in the art that the direct transcription 
method described above provides an antisense (aRNA) pool Where antisense RNA is 
used as the target nucleic acid, the oligonucleotide probes provided in the array are chosen 

20 to be complementary to subsequences of the antisense nucleic acids. Conversely, where 
the target nucleic acid pool is a pool of sense nucleic acids, the oligonucleotide probes are 
selected to be complementary to subsequences of the sense nucleic acids. Finally, where 
the nucleic acid pool is double stranded, the probes may be of either sense as the target 
nucleic acids include both sense and antisense strands. 

25 The protocols cited above include methods of generating pools of either 

sense or antisense nucleic acids. Indeed, one approach can be used to generate either sense 
or antisense nucleic acids ^ds (desired. For cxaitiple, the cDN4 cr.rx be dii ectionally cloned 
into a vector (e.g., Suatagcnc's p Biu5cript !! KS (+) nhasemid) such trial li is ilaiiked by 
the T3 aiid T7 promoters. In vitro tniT^scription with the T3 polymerase will produce RNA 

50 of oiit; sense (the sense depe"(^'"2 ^" orientation of the insert), while in vitro 

transcripticn u^iLh the 17 polymerase will piuuuce RNA having the opposite sense, Uthei 
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suitable cloaing systems include phage lambda vectors designed for Ctq-IoxP plasmid 
subcloning {see e.g., Palazzolo et ai. Gene, 88: 25-36 (1990)). 

In a particularly preferred embodiment, a high activit>' RNA polymerase 
{e,g. about 2500 units/^L for T7, available from Epicentre Technologies) is used. 

5 

B) Labeling nucleic acids 

i) Labeling methods/strategies. 

In a preferred embodiment, the hybridized nucleic acids are detected by 
detecting one or more labels attached to the sample nucleic acids. The labels may be 

10 incorporated by any of a number of means well knovvn to those of skill in the art. 

However, in a preferred embodiment, the label is simultaneously incorporated during the 
amplification step in the preparation of the sample nucleic acids. For example, polymerase 
chain reaction (PCR) with labeled primers or labeled nucleotides will provide a labeled 
amplification product. The nucleic acid (e.g., DNA) is be amplified in the presence of 

15 labeled deoxynucleotide triphosphates (dNTPs). The amplified nucleic acid can be 
fragmented, exposed to an oligonoucleotide array, and the extent of hybridization 
determined by the amount of label now associated with the array. In a preferred 
embodiment, transcription amplification, as described above, using a labeled nucleotide 
(e.g. fluorescein-labeled UTP and/or CTP) incorporates a label into the transcribed nucleic 

20 acids. 

Alternatively, a label may be added directly to the original nucleic acid 
sample (e.g., mRNA, polyA mRNA, cDNA, etc.) or to the amplification product after the 
amplification is completed. Such labeling can result in the increased yield of amplification 
products and reduce the time required for the amplification reaction. Means of attaching 

25 labels to nucleic acids include, for example nick translation or end-labeling (e.g. with a 
labeled VlNA) by km^vno ot the nucleic acid and subsequent attachment (ligation) of a 
nucleic acid linker joini^j; .sample nuclcic ac;d to a iabei (e.g., a iiuoruphu*^) ^'«ui 
labeling is discussed in more detail beiow in Section ni(B)(iii). 

Detectable labels suitable for u^e in the present invention include any 

30 romnosition detectaoie bv specau:>uuplc, pliuluwuCirrical, bicchemica!, imrr-uncchemic^'', 
electncai, optical or chemical means. Useful labels in the present invention include bmtin 
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for Staining with labeled slreptavidin conjugate, magnetic beads (e.g., Dynabeads"""^), 
fluorescent dyes {e.g., fluorescein, texas red, rhodamine, green fluorescent protein, and the 
like, see, e.g.. Molecular Probes, Eugene, Oregon, USA), radioiabels (e g., ^H, ^'^I, ^^S, "^C, 
or ^^P), exizymes (e.g., horse radish pt5roxida:>e, alkaline phosphatase and ethers coniTTiorJy 
5 used in an ELISA), and colorimetric labels such as colloidal gold {e.g., gold particles in the 
40 -80 nm diameter size range scatter green light with high efficiency) or colored glass or 
plastic (e.g., polystyrene, polypropylene, latex, etc.) beads. Patents teaching the use of such 
labels include U.S. Patent Nos. 3,817,837; 3,850,752; 3,939,350; 3,996,345; 4,277,437; 
4,275,149; and 4,366,241. 

10 A fluorescent label is preferred because it provides a very strong signal with 

low backgroimd. It is also optically detectable at high resolution and sensitivity through a 
quick scanning procedure. The nucleic acid samples can all be labeled with a single label, 
e.g., a single fluorescent label. Alternatively, in another embodiment, different nucleic acid 
samples can be simultaneously hybridized where each nucleic acid sample has a different 

15 label. For instance, one target could have a green fluorescent label and a second target 
could have a red fluorescent label. The scanning step will distinguish cites of binding of 
the red label from those binding the green fluorescent label. Each nucleic acid sample 
(target nucleic acid) can be arsalyzed independently from one anotner. 

Suitable chromogens which can be employed include those molecules and 

20 compounds which absorb light in a distinctive range of wavelengths so that a color can be 
observed or, alternatively, which emit light when irradiated with radiation of a particular 
wave length or wave length range, e.g., fluorescers. 

A wide variety of suitable dyes are available, being primary chosen to 
provide an intense color with minimal absorption by their surroundings. Illustrative dye 

25 types include quinoline dyes, triarylmethane dyes, acridine dyes, alizarine dyes, phthaleins, 
insect dyes, a^o dyes, anthraquinoid dyes, cyanvne dyes, phenazathionium dyes, and 
phcnazoxcnium dyejs. 

A wide varictj' cf fluorescers be employed either by aloiie ui, 
aiteniati vely, in conjunction with qiJencher molecules. Fluorescers of interest fall into a 

50 Viuitij of categories having r*=rt^in primary functionalities. I hese pnmary tunctionaiiiies 
include I and 2-amiinnnaphthalene, p,p-diarniiiuMiibenc3, pyrenes, quaternary 
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phenanthridine salts, 9-aininoacridines, p,p'-diaminobenzophenone imines, anthracenes, 
oxacarbocyanine, marocyanine, 3-aminoequiienin, perylene, bisbenzoxazole, 

bis-p-oxazolyl benzene, L2-benzophenazin, retincl, bis-3-aminopyridinium salts. 



indole, xanthcn, 7-hydroxycoamarin, phenoxazine, salicylate, strophanthidin, porphyrins. 



for linking or which can be modified to incorporate such functionalities include, e.g., 
dansyl chloride; fluoresceins such as 3,6-dihydroxy-9-phenylxanthhydrol; 
rhodamineisothiocyanate; N-phenyl 1 -amino-8-suIfonatonaphthalene; N-phenyl 
2-amino-6-sulfonatonaphthalene: 4-acetamido-4-isothiocyanato-stilbene-2,2'-disulfonic 
acid; pyrene-3-sulfonic acid; 2-toluidinonaphthaiene-6-sulfonate; N-phenyl, N-methyl 
2-aminoaphthalene-6-sulfonate; ethidium bromide; stebrine; 

auromine-0,2-(9'-anthroyl)paImitate; dansyl phosphatidylethanolamine; N,N'-dioctadecyl 
oxacarbocyanine; N,N*-dihexyl oxacarbocyanine; merocyanine, 4(3'pyrenyl)butyrate; 
d-3-aminodesoxy-equilenin; 12-(9'anthroyl)stearale; 2-methylanthracene; 
9-vinylanthracene; 2,2'(vinylene-p-phenylene)bisbenzoxazoIe; p-bis[2-(4-methyU5- 
^u««vi.riva7nlvnibenzene; 6-dimethylamino-l,2-benzophenazin; retinol; 
bis(3'-aminopyridinium) 1,10-decandiyl diiodide; sulfonaphthylhydrazone of hellibrienin; 
chlorotetracycline; N(7-dimethylamino-4-methyl-2"Oxo-3-chromenyl)maleimide; N-[p-(2- 
benzimidazolyl)-phenyl]maleimide; N-(4-fluoranthyl)maleimide; bis(homovanillic acid); 
resazarin; 4-chloro-7-nitro-2,l,3benzooxadiazole; merocyanine 540; resorufm; rose bengal; 
and 2,4-diphenyl-3(2H)-furanone. 

Desirably, fluorescers should absorb light above about 300 nm, preferably 
about 350 nm, and more preferably above about 400 nm, usually emitting at wavelengths 
greater than about 10 nm higher than the wavelength of the light absorbed. It should be 
noted that the absoi^uon and zzrAzsion cHAr^cteristics of the bound dye can differ from the 

A^^,^ TlieiefcMr. when referri'^5 *o tlit^ vanous v;aveiength migf s 
characteristics of the dyes, it is intended to mdicaie the dyes as employed and not the dye 
Br>d charactenzed in an arbiuary soivcnt. 



hellebrigenin, tetracycline, stcrophenoK benzimidzaolylphcnylamine, 2 oxo-3-chromen. 



triaryimethanes and fla^ 



ivin. Individual flucrei^cent compounds which have functioTialities 



B\SDDCID <W0 S727317A1_L> 



wo 97/27317 PCT/US97/01603 

39 

Fluorescers are generally preferred because by irradiating a fluoresce: with 
light, one can obtain a plurality of emissions. Thus, a single label can provide for a 
plurality of measurable events. 

Detectable signal can also be provided by chcmiluminescent and 

5 bioluminescent sources. Chemiluminescent sources include a compound v^'hich becomes 
eiectronically excited by a chemical, reaction and can then emit light which serves as the 
detectible signal or donates energy to a fluorescent acceptor. A diverse number of families 
of compounds have been found to provide chemiluminescence under a variety or 
conditions. One family of compounds is 2,3-dihydro-lr4-phthala2inedione. The must 

10 popular compound is luminol, which is the 5-amino compound. Other members of the 
family include the 5-amino-6,7,8-trimethoxy- and the dimethylamino[ca]benz analog. 
These compounds can be made to luminesce v^th alkaline hydrogen peroxide or calcium 
hypochlorite and base. Another family of compounds is the 2,4,5-triphenylimidazoles, 
with iophine as the common name for the parent product. Chemiluminescent analogs 

15 include para-dimethylamino and -methoxy substituents. Chemiluminescence can also be 
obtained with oxalates, usually oxalyl active esters, e.g., p-nitrophenyl and a peroxide, e.g., 
hydrogen peroxide, under basic conditions. Alternatively, luciferins can be used in 
conjunction with luciferase or lucigenins to provide biolummescence. 

Spin labels are provided by reporter molecules with an unpaired electron 

20 spin which can be detected by electron spin resonance (ESR) spectroscopy. Exemplary 
spin labels include organic free radicals, transitional metal complexes, particularly 
vanadium, copper, iron, and manganese, and the like. Exemplary spin labels include 
nitroxide free radicals. 

The label may be added to the target (sample) nucleic acid(s) prior to, or 

25 after the hybridization. So called "direct labels" are detectable labels that are directly 

attached to or incorporated into the target (sample) nucleic acid prior to hybridization. In 
contmst so called "indireci iab^is" arc jcir;cd to uie hybrid di-p!rx -ftt^r hybiidisation. 
Often, the mdirect label ib auachcd to a binding moiety tn^^t has been attached io Uie taigci 
nucleic acid prior to the hybndizaticn. Tbv\ for example, the target nucleic acid may be 

iU biotinyiated beiuie uhc hybridization After hybridization, an avi din-conjugated 

fiuorophore will bind the Hiotin bearing hybrid duplexc;* pioviding a label that is easily 
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detected. For a detailed review of methods of labeling nucleic acids and detecting labeled 
hybridized nucleic acids see Laboratory Techniques in Biochemistry^ and Molecular 

Biolog}\ Vol 24: Hybridization With Nucleic Acid Probes, P. Tijssen, ed. ElseviLT, N. Y., 
(1993)). 

5 Fluorescent labels are preferred and easily added during an in vitro 

trar^cription reaction. In a preferred embodiment, fluorescein labeled UTP and CTP are 
incorporated into the RNA produced in an in vitro transcription reaction as described 
above. 

The labels can be attached directly or through a linker moiety. In general, 
10 the site of label or linker-label attachment is not limited to any specific position. For 

example, a label may be attached to a nucleoside, nucleotide, or analogue thereof at any 
position that does not interefere with detection or hybridization as desired. For example, 
certain Label-ON Reagents from Clontech (Palo Alto, CA) provide for labeling 
interspersed throughout the phosphate backbone of an oligonucleotide and for terminal 
15 labeling at the 3' and 5' ends. As shown for example herein, labels can be attached at 

positions on the ribose ring or the ribose can be modified and even eliminated as desired. 
The base mioeties of useful labeling reagents can include those that are naturally occurring 
or modified in a manner that does not interfere with the purpose to which they are put. 
Modified bases include but are not limited to 7-deaza A and G, 7-dea2a-8-aza A and G, 
20 and other heterocyclic moieties. 

iL End-labeling nucleic acids. 

In many applications it is useful to directly label nucleic acid samples 
without having to go through an amplification, transcription or other nucleic acid 
25 conversion step. This is especially true for monitoring of mRNA levels where one would 
like to extract total cj'toplas!??^ RNA or poly A f RNA (mRNA) from cells and hybridize 
this material w'ithoui any in^erriediHie slt^m fodi eould skew the onginal distribution ot 
mRNA concentrations. 

In general, end-labeiing methods permit the optimization of the size of the 
30 niicleir ?5ciri to he labeled, hnd-iabeiing methods aiso decrease uie se^aence bia:> 
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sometimes associated with polymerase-facilitated labeling methods. End labeling can be 
performed using terminal transferase (TdT). 

End labeling can also be accomplished by ligating a labeled oligonucleotide 
or analog thereof to the end of a target nucieic acid or probe. Oth^^r end-iaheling methods 
5 include the creation of a labeled or unlabeled ''tail" for the nucleic acid using Sigase or 
leiminal transferase, for example. The tailed nucleic acid is then exposed to a labeled 
moiet>' that will preferentially associate with the tail. The tail and the moiety that 
preferentially associates with the tail can be a polymer such as a nucleic acid, peptide, or 
carbohydrate. The tail and its recognition moiety can be anything that permits recognition 
10 between the two, and includes molecules having ligand-substrate relationships such as 
haptens, epitopes, antibodies, enzymes and their substrates, and complementary nucleic 
acids and analogs thereof. 

The labels associated with the tail or the tail recognition moiety include 
detectable moieties. When the tail and its recognition moiety are both labeled, the 

15 respective labels associated with each can themselves have a ligand-substrate relationship. 
The respective labels can also comprise energy transfer reagents such as dyes having 
different spectroscopic characteristics. The energy transfer pair can be chosen to obtain the 
desired combined spectral characteristics. For example, a first dye that absorbs at a 
wavelength shorter than that absorbed by the second dye can, upon absorption at that 

20 shorter wavelength, transfer energy to the second dye. The second dye then emits 

electromagnetic radiation at a wavelength longer than would have been emitted by the first 
dye alone. Energy transfer reagents can be particularly useful in two-color labeling 
schemes such as those set forth in a copending U.S. patent application, filed December 23, 
1996, Attorney Docket No. 2013.2, and which is a continuation-in-part of USSN 

25 08/529,1 15, filed September 15, 1995, and Int1 Appln. No. WO 96/14839, filed September 
13, 1996, wiiich is also a continuation in part of USSN 08/670,1 18, filed on June 25, 1996, 
wiiich is a division of uSSM Oo/i (*8,W4, fiscd December 15, 1993. which i:; a corituiuai:o;i 
of USSN 07/624,1 14, filed December 6, 1990. USSN 07/624,1 !4 is z CIP of USSN 
07/362,901, filed June 7, 1990, incorporated herein by refereric^ 

30 Thic invention thus prr>viHf-c mpthoHc of !^*belinQ a nucleic acid and 

reagents useful therefor. Many ot the methods discisoed herein involve end-labeling. 
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Those skilled in the art will appreciate that the invention as disclosed is generally 
applicable in the chemical and molecular-biological arts. 

In one embodiment the method involves providing a nucleic acid, 
providing a labeled oligonucleotide and enzv-maticaily ligating the oiigonuclectide to the 
nucleic acid. Thui;, for example, where the nucleic acid is an RNA, a labeled 
riboligonucleotide can be ligated using an RNA ligase RNA ligasc catalyzes the covaleni 
joining of single-stranded RNA (or DNA, but the reaction with RNA is more efficient) 
with a 5' phosphate group to the 3'-0H end of another piece of RNA (or DNA). The 
specific requirements for the use of this enzyme are provided in The Enzymes. Volume XV, 
Pan B, T4 RNA Ligase, Uhlenbeck and Greensport, pages 31-58; and 5.66-5.69 in 
Sambrook et al. Molecular Cloning. A Laboratory Manual, Cold Spring Harbor Press, 
Cold Spring Harbor, New York (1982) 

This invention thus provides a method to add a label to the nucleic acid {e.g. 
extracted RNA) directly rather dian incorporating labeled nucleotides in a nucleic acid 
15 polymerization step. This can be accomplished by adding a short labeled oligonucleotide 
to the ends of a single stranded nucleic acid. The method more fully labels a sample; a 
higher oercentage of available molecules will be labeled than by conventional techniques. 

RNA can be randomly fragmented with heat in the presence of Mg". Tnis 
generally produces RNA fragments with 5' OH groups and phosphorylated 3' ends. A 
20 phosphate group is added to the 5' ends of the firagments using standard protocols with T4 
Polynucleotide Kinase, or similar enzyme. To the pool of 5'-phosphoryIated RNA 
fragments is added RNA ligase plus a short RNA oligonucleotide with a 3' OH group and a 
label, either at the 5' end (such as fluorescein or other dye. or biotin for later labeling with a 
streptavidin conjugate, or with dioxigenin for later labeling with a labeled antibody) or 
25 with one or more labeled bases. A riboA, (deoxyribonucleic acid 6 mer poly A) labeled 
wi'iii eitlier flucrsscein or bmtin at the 5' end provides a particularly preferred label. In 
onotlter ciMhcdiment. the lisaied RNA oligonucieotide oiu have ricibnu-jieotidcs -err ths 
ligation end, but deoxyrigonucleotides fiirther a^vay. Of course, the RNA oHgonucieotide 
can be lov^r or shorter and can have a virtually any sequence. However, the ligation 
reaction is mo« efficient with A and ica^t efficient v.-ith U ?t ^ end of the acceptor. 
The reaction is allowed to pioceed under standard cond-tmns. Unincorpoiated RNA 6- 



30 
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mers can be removed by a simple size selection step (e.g. electrophoresis, NAP column, 
eic.) if necessary following the ligation reaction. 

An advantage of this procedure is that extracted mRNA can be used directly 
and thai each fragment should he iabeled once, not any number cf times depending cn th^ 
5 sequence as is the case when labi;;Ied bases are incorporated during polymerization 
reactions. 

In another embodiment, fragmented DNA can also be end-labeled using a 
different procedure with a different enzyme. Terminal transferase will add 
deoxynucleoside triphosphates (dNTPs), which can be labeled, to the 3' OH ends of single 

10 stranded DNA. Single dNTPs can be added if modified nucleotides are used (for example, 
dideoxynucleotide triphosphates), or multiple bases can be added if desired. DNA can be 
fragmented either physically (shearing) or enzymatically (nucleases), or chemically {e.g. 
acid hydrolysis). Following fragmentation, depending on the method, 3' OH ends may 
need to be produced. The DNA fragments are then labeled using labeled dNTPs or 

15 ddNTPs in the presence of terminal transferase. 

Various other embodiments are illustrated by the Examples provided herein 
and their associated figures. 



C) Modifying Sample to Improve Signal to Noise Ratio. 

20 The nucleic acid sample may be modified prior to hybridization to the high 

density probe array in order to reduce sample complexity thereby decreasing background 
signal and improving sensitivity of the measurement. In one embodiment, complexity 
reduction for expression monitoring methods is achieved by selective degradation of 
background mRNA. This is accomplished by hybridizing the sample mRNA (e.g., polyA"^ 

25 RNA) with a pool of DNA oligonucleotides that hybridize specifically with the regions to 
which the probes in the expression mopitoripg array specifically hybridize. In a preferred 
^.xiijuidl.ricnu the poo; cf c;igoi:ui;:ei;l;tI:;b i.'j;;:iiits of the Same, p"o-^ o-so^-r^mtKH-^; p*; 
fcur^d on the high density err?.y 

The poo! of oligonucleotides hybridizes to the sample nvRNA forming a 

30 ni^mber of do^'ble stranded (hybrid duplex) nucleic acids. The hybridi2:ed sample is then 
treated with RNase A. a nuclease umL specifiutll)' digests single stranded RNA. The 
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RNase A is then inhibited, using a protease and/or commercially available RNase 
inhibitors, and the double stranded nucleic acids are then separated from the digested 

single stranded RNA. This separation may be accomplished in a number of ways well 
known to those of skiil in the art including, but not limited to, electrophoresis, and nriidient 
5 centrifugation. However, in a preferred embodiment, the pool of DNA oligonucleotides is 
provided anached to beads forming thereby a nucleic acid affinity column. After digestion 
with the RNase A, the hybridized DNA is removed simply by denaturing (e.g., by adding 
heat or increasing salt) the hybrid duplexes and washing the previously hybridized mRNA 
off inanelution buffer. 

JO The undigested mRNA fragments which will be hybridized to the probes in 

the high density array or other solid support are then preferably end-labeled with a 
fluorophore attached to an RNA linker using an RNA ligase. This procedure produces a 
labeled sample RNA pool in which the nucleic acids that do not correspond to probes in 
the array are eliminated and thus unavailable to contribute to a background signal. 

j5 Another method of reducing sample complexity involves hybridizing the 

mRNA with deoxyoligonucleotides that hybridize to regions that border on either side the 
--rri^nc to which the high density array probes are directed. Treatment with RN Ase H 
selectively digests the double stranded (hybrid duplexes) leaving a pool of single-stranded 
mRNA corresponding to the short regions (e.g., 20 mer) that were formerly bounded by the 

20 deoxyoligonucieotide probes and which correspond to the targets of the high density array 
probes and longer mRNA sequences that correspond to regions between the targets of the 
probes of the high density array. The short RNA fragments are then separated from the 
long fragments (e.g., by electrophoresis), labeled if necessary as described above, and then 
are ready for hybridization vwth the high density probe array. 

25 In a third approach, sample complexity reduction involves the selective 

removal of prnticular (preEelected) mRNA messages. In particular, highly expressed 
^-.i v: A :Tit;i;iii-c> that are not •^eci really probed by the probt^s in the high de:ts:'-y 2ir?y rri-. 
preferably removed. This approach involves hybridi/Jng tl^c poly A" niRNA wth an 
olifacnuclsoti'^- p'-nbe that soeciticaily hybriul«s to the preselected me«:sape close to the 3' 
"M") (poly A) end. The prnhe may oe seiecieJ lo ^iOvidc high cpecif^'-i'v snd low cross 

reactivity. Treatment of the hybridized mcsscgs/prcbe coipniex with RiNase H digests the 
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double stranded region effectively removing the polyA" tail from the rest of the message. 
The sample is then treated with methods that specifically retain or amplif>' poIyA* RNA 
(e.g., an oligo dT column or (dT)n magnetic beads). Such methods will not retain or 
amplify the selected message(s) as they are no longer associated with a polyA* tail. These 
5 highly expressed messages are effectively removed from the sample providing a sample 
that has reduced background mRNA. 



IV. Hybridization Array Design. 
A) Probe Composition. 

IQ One of skill in the art will appreciate that an enormous number of array 

designs are suitable for the practice of this invention. Generic difference screeing arrays, 
for example may include random, haphazardly selected, or aribtrary probe sets. 
Alternatively, the generic difference screening arrays may include all possible 
oligonucleotides of a particular pre-selected length. Conversely, other expression 

15 monitoring arrays typically include a number of probes that specifically hybridize to the 

nucleic acid(s) expression of which is to be detected. In a preferred embodiment, the array 
will include one or more control probes. 



1) Test probes. 

20 In its simplest embodiment, the high density array includes "test probes" 

(also referred to as probe oligonucleotides) more than 5 bases long, preferably more than 
10 bases long, and some more than 40 baes long. In some embodiments, the probes are 
less than 50 bases long. In some cases, these oligonucleotides range from about 5 to about 
45 or 5 to about 50 nucleotides long, more preferably from about 10 to about 40 

25 nucleotides long, and most preferably from about 15 to about 40 nucleotides in length. In 
other particularly pieferred embodiments the probes are 20 or 25 nucleotides in length. In 
preselected expression niomiourig arrays, these prube 0:igor:UC;COi?d?^ ^'-.vc* r,j^qut;nct;s 
compiementary to paiticular subsequences of the gene<: whose expression they are 
designed to detect. Thus, the test probes are capable of specifically hybridizing to the 

50 uu gci iiuclcic acid they are to detect. 
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In high density oligonucleotide arrays, designed for generic difference 
screening, the probe oligonucleotides need not be selected to hybridize to particular 
preselected subsequences of genes. To the contrar>\ preferred generic difference sci eening 
arrays comprise probe oligonucleotides whose sequences are random, m^bitrary, or 
5 haphazard. Alternatively, the probe oligonucleotides may include all possible nucleotides 
of a given lengtli (e.g., ail possible 4 mers, all possible 5 mers, al! possible 6 rners, all 
possible 7 mers, all possible 8 mers, all possible 9 mers, all possible 10 mers, al! possible 
1 1 mers, all possible 12 mers, eic.) 

A random oligonucleotide array is an array in which the f>ool of nucleotide 

10 sequences of a particular length does not significantly deviate from a pool of nucleotide 

sequences selected in a random manner {i.e., blind, unbiased selection) from a collection of 
all possible sequences of that length. 

Arbitrary or haphazard nucleotide arrays of probe oligonucleotides are 
arrays in which the probe oligonucleotide selection is selected without identifying and/or 

15 preselecting target nucleic acids. Arbitrary or haphazard nucleotide arrays may 

approximate or even be random, however there in no assurance that they meet a statistical 
definition of randomness. 

The arrays may reflect seme nucleotide selection based on probe 
composition, and/or non-redundancy of probes, and/or coding sequence bias as described 

20 herein. In a preferred embodiment, however such "biased" probe sets are still not chosen 
to be specific for any particular genes. 

An array comprising all possible oligonucleotides of a particular length 
refers to an array that contains oligonucleotides having sequences corresponding to 
substantially every permutation of a sequence. Thus since the probe oligonucleotides of 

25 this invention preferably include up to 4 bases (A, G, C, T) or (A, G, C, U) or derivatives 
cf these b^ses, 2«i ps^ay having all possible nucleotides of length X contains substantially 
4^ different r^uc'eir mnls (e g,. !& different nucieic acids ioi ^ / (^4 different nucleic 
acids for a 3 mcr, 65536 different nucieic acids foi aii 8 mcr, etc.). It 'vvill be appreciated 
that some small number of sequeiites luay be inadvertently absent from a poo! of 

30 nn^jsible nucleotides of a pm uculai Lngth due to synthecrc proble!r»*^, in:3Hvprtpnt plpavaje. 
etc). 1 hus, it will be appreciated that an array comprising al* nnssibie nucleotides of 
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length X refers to an array having substantially all passible nucleotides of length X. 
Substantially all possible nucleotides of length X includes more than 90%, typically more 
than 95%, preferably more than 98^/o, more preferably more than 99%, and most preferably 
more than 99.9% of the possible number of different nucleotides. 

5 The probe oligonucleotides described above can additionally include a 

constant domain. A constant domain being a nucleotide subsequence that is common to 
substantially all of the probe oligonculeotides. Particularly preferred constant domains are 
located at the temiinus of the oligonucleotide probe closest to the substrate {i.e., attached to 
the linker/anchor molecule). The constant regions may comprise virtually any sequence. 

10 However, in one embodiment, the constant regions comprise a sequence or subsequence 

complementary to the sense or antisense strand of a restriction site (a nucleic acid sequence 
recognized by a restriction endonuclease). 

The constant domain can be synthesized de novo on the array. 
Alternatively, the constant region may be prepared in a separate procedure and then 

15 coupled intact to the array. Since the constant domain can be synthsized separately and 
then the intact constant subsequences coupled to the high density array, the constant 
domain can be virtually any length. Some constant domains range from 3 nucleotides to 
about 500 nucleotides in length, more typicaiiy from aoout 5 nucleoildes in It^iiglli about 
100 nucleotides in length, most typcically from 3 nucleotides in length to about 50 

20 nucleotides in length. In particular embodiments, constant domains range from 3 

nucleotides to about 45 nucleotides in length, more preferably from 3 nucleotides in length 
to about 25 nucleotides in length and most preferably from 3 to about 15 or even 10 
nucleotides in length. In other embodiments, preferred constant regions range from about 
5 nucleotides to about 15 nucleotides in length. 

25 In addition to test probes that bind the target nucleic acid(s) of interest, the 

high density array can contain a number of control probes. The control probes fall into 
three ca*eyo-t^> icicrrcd to he;t;i;: as 1) No~"n7^tir- co-trcls: 2) Expression Iwe! 
coiiuuls; and 3) Mismatch contro*^ 
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2) Normalization controls. 

Normalization controls are oligonucleotide probes that are perfectly 
complementar\' to labeled reference oligonucieotides that are added to the nucleic acid 
sample. The signals obtained from the normalization controls after hybridi'zation provide a 
5 control for variations in hybridization conditions, label intensity, ''reading'' efficiency and 
other factors triat may cause the signal of a perfect hybridization to van between arrays. In 
a preferred embodiment, signals {e.g., fluorescence intensity) read from all other probes in 
the array are divided by the signal (e.g., fluorescence intensity) from the control probes 
thereby normalizing the measurements. 

10 Virtually any probe may serve as a nonnalization control. However, it is 

recognized that hybridization efficiency varies with base composition and probe length. 
Preferred normalization probes are selected to reflect the average length of the other probes 
present in the array, however, they can be selected to cover a range of lengths. The 
normalization control(s) can also be selected to reflect the (average) base composition of 

15 the other probes in the array, however in a preferred embodiment, only one or a few 

normalization probes are used and they are selected such that they hybridize well (i.e. no 
secondary structure) and do not match any target-specific probes. 

Normalization probes can be localized at any position in the array or at 
multiple positions throughout the array to control for spatial variation in hybridization 

20 efficiently. In a preferred embodiment, the normalization controls are located at the 
comers or edges of the array as well as in the middle. 

3) Expression level controls. 

Expression level controls are probes that hybridize specifically with 
25 constitutively expressed genes in the biological sample. Expression level controls are 

designed to con^ol for the overall health and metabolic activity of a ceil. Examination of 
the ccvariance of ^itHvession Icvc! ccntrci v/iih the expreiiiiioi* i^veJ of the target nucleic 
acid indicates whether measured changes or variatioui> in expression level of a gene is due 
to changes in transcription rate of ami gene or to genera! variations in health of th? o^^li 
30 Thus, tor exampJe, when a cell i:> in peor health or lacliing a crJAcal rn^^^hnhte |hp 

expression levels of buth an active target gene and a constiti^ttvely expressed gene are 
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expected to decrease. The converse is also true. Thus where the expression levels of both 
an expression level control and the target gene appear to both decrease or to both increase, 
the change may be attributed to changes in the metabolic activity of the cell as a whole, not 
to differential expression of the target gene in question. Conversely, where the expression 
5 levels of the target gene and the expression level control do not covar>', the variation in the 
expression level of the target gene is attributed to differences in regulation of that gene and 
not to overall variations in the metabolic activit>* of the cell. 

Virtually any constitutively expressed gene provides a suitable target for 
expression level controls. Typically expression level control probes have sequences 
10 complementary to subsequences of constitutively expressed "housekeeping genes" 

including, but not limited to the B-actin gene, the transferrin receptor gene, the GAPDH 
gene, and the like. 

4) Mismatch controls. 

15 Mismatch conu-ols may also be provided for the probes to the target genes, 

for expression level controls or for normalization controls. Mismatch controls are 
oligonucleotide probes identical to their corresponding test or control probes except for the 
presence of one or more mismatched bases. A mismatcnea oase is a base selecicu so that it 
is not complementary to the corresponding base in the target sequence to which the probe 

20 would otherwise specifically hybridize. One or more mismatches are selected such that 
under appropriate hybridization conditions (e.g. stringent conditions) the test or control 
probe would be expected to hybridize with its target sequence, but the mismatch probe 
would not hybridize (or would hybridize to a significantly lesser extent). Preferred 
mismatch probes contain a central mismatch. Thus, for example, where a probe is a 20 

25 mer, a corresponding mismatch probe vAW have the identical sequence except for a single 
base mismatch (e.g., substituting a G, a C or a T for an A) at any of positions 6 through 14 
(the centTHi ?f**s*iiaiCiiF. 

In "generic" {e.g., raT^dom. arbnrary, haphazard, eic. ) aiiays, since the uirget 
nucleic ac:d(s) are unj^t^own perfect match and mismatch probes cannot be a priori 

30 determined, designed, or selected, in this instance, the probes are preferably provided ai> 
'^airs where each pair of prober uiffer in one or mere preselected nucleotides, i^ijs, while 
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it is not known a priori which of the probes in the pair is the perfect match, it is known 
that when one probe specifically hybridizes to a particular target sequence, the other probe 
of the pair will act as a mismatch control for that target sequence, it wil! be appreciated 
that the perfect match and mismatch probt's need not be provided as pairs, but may he 
provided as larger collections {e.g., 3. 4. 5, or more) of probes that differ from each other 
in particular preselected nucleotides. 

In both expression monitoring and generic difference screening arrays, 
mismatch probes provide a control for non-specific binding or cross-hybridization to a 
nucleic acid in the sample other than the target to which the probe is complementary. 
Mismatch probes thus indicate whether a hybridization is specific or not. For example, if 
the complementary target is present the perfect match probes should be consistently 
brighter than the mismatch probes. In addition, if all central mismatches are present, the 
mismatch probes can be used to detect a mutation. Finally, it was also a discovery of the 
present invention that the difference in intensity between the perfect match and the 
mismatch probe (I(PM)-I(MM)) provides a good measure of the concentration of the 
hybridized material. 

S) Sample preparation/amplification/quantkation controls. 

The high density array may also include sample preparation/amplification 
control probes. These are probes that are complementary to subsequences of control genes 
selected because they do not normally occur in the nucleic acids of the particular biological 
sample being assayed. Suitable sample preparation/amplification control probes include, 
for example, probes to bacterial genes (e.g.. Bio B) where the sample in question is a 
biological from a eukaryote. 

The RNA sample is then spiked with a knovm amount of the nucleic acid to 
vvtiich the sample prepHr^^^^on/amnlificaiion control piobe is directed before processing. 
Oi^r-'^^^^cat;cn of the hybrif^'>^>'tiori ol the sample preparaiioiL''!±![i*pi'»»*..aMori centre; probe 
then provides a measure of alteration m the abundance of ilic nuclcc acids caused by 
nrpr^f^QQmfT stcos (c^. PCR, rcversc transcripuoii, in vitro transcription, etc.). 

Quantitation conuol^ oie buailcu. Typically Lhey nre r^rnW.r^^A with the 
sample nucleic acid(s) in known amcunts prior to hybridi^atioTi. l hey are useful to 
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provdie a quantitiation reference and permit determination of a standard curve for 
quantifing hybridization amounts (concentrations). 

B) Probe Selection and Optimization. 
5 i) Generic difference screening arrays 

a) Assumpiion-free probe selection. 

As explained above, probe oligonculetide selection for generic difference 
screening arrays can be random, arbitrary haphazard, compositin biased, or include al! 
possible oligonculeotides of a particular length. Probe choice is thus essentially 

10 assumption free. In some embodiments, however, particular oligonucleotides may be 

excluded from the array or from analysis. For example, probes that contain palindormic 
sequences or probes that contain long stretches of all As, Cs, Gs, Ts, etc, may be excluded. 
Probes for exclusion may be identified by hybridizing a single array to the same sample 
multiple limes and/or hybridizing different copies of the array to the same sample. Probes 

15 that show that show an unacceptable variation (variation above a particular threshold 

value) in hybridization intensity against the same sample may be excluded (either in array 
construction or in signal analysis). The variation level at which a probe may be excluded is 
a function of the sensitivity desired of the assay. The more sensitive an assay is desired, 
the lower the exclusion threshold is set. In a preferred embodiment, the probe is excluded 

20 when the variation in hybridization intensity exceeds 2 times the background signal and 
has a relative variation of more than 50%, 

Alternatively such exclusion may be inherent in the selective identification 
of differentially hybridizing sequences where the difference between a test nucleic acid 
sample and a reference nucleic acid sample is compared to the difference between the 

25 reference nucleic acid sample and itself. This is described more fully below in Section 
iX(B). 

b) Explci!atio*t oj codon degeneracy. 

in another embodiment, specsss-specific ccdon usage can be exploited to 
30 utilize a longer (and hf n^ore specific and stable) probe without mcreasmg me number 
of probe oligopijcieotides necessary io hybiiuiz^e to all possible sequences. Ajnino acid 
codons are conserved in the first and second position of their codons, while the third 
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position is highly redundant. Moreover each species or organism favors particular codons 
to encode any particular amino acid. The preferred codon for a particular amino acid in a 
particular species being the codon that is used at the highest frequency for that species. 
Codon preferences are well known to those of skill in the ait. They can also be readily 
determined by a simple frequency analysis of the nucleotide sequences of a particular 
organism or specjes. 

Similarly, the di, tri-, tetra-nucieotide frequency biases of an particular 
organism or species can be used to weight the selection of oligonucleotide probes used in 
"composition biased" generic difference screening array. 

In one preferred embodiment, the probe oligonucleotides are prepared 
having the first two nucleotides in each codon being fixed but allowing the third nucleotide 
to vary (either by use of a 4 way wobble or by the use of inosine or other non-specifically 
hybridizing base). In a preferred embodiment, each codon of the probe will have the 
general formula 

3'-X^-x2-I-5' 

where I is inosine or a 4-way wobble and X* and are A, G, C, T/U selected according to 
the preferred rodor iiS5ge for a particular soecies. Thus, for example, an array of 1 6 mers 
that will hybridize to substantially all nucleic acids of a particular species can be prepared 
where the probes have the formula: 

with only 4*° different probe oligonucleotides. Suitable codons for this probe are 
illustrated in Table 1 . 

Table 1. Preferred sequences for generic coding sequence 16 mer probe oligonucleotides. 
'Derived from standard tabel of amino acid codons (the genetic code).) 



25 



Codon 5 



Codon 4 



Codon 3 



Codon 2 



Codon 



30 



I \/3 



J4 



v-n 



1 ¥ 



I 



I ! I Ci ! A i I i Ct i A i 1 O j A 



il6 



1 



T I I 



1 r 



4- 



1 I 

±1 ^1 



BNSD3CI0 <W0_9727317A1J__> 



wo 97/27317 



PCT/US97/01603 



53 



i 


A 


T 


I 


A 


T 


I 


A 


T 


I 


A 1 T 


I 


A 


T 




I 


G 


G 


I 


G 


G 


, I 


G 


G 


I 


G 


G 


I 


G 


G 




I 


G 


T 




G 


T 


1 I 


G 


T 


i 


G 


T 




G 


T 


I 


I 


C 


C 


I 


C 


C 


I 


C 


C 


1 


C 


C 




C 


C 


I 


I 


T 


T 


I 


T 


T 


1 


T 


T 


I 


T 


T 


I 


T 


T 


I : 


I 


A 


C 


I 


A 


^ I 


I 


A 


C 


V 

1 


A 


C 




A 


C 




I 


A 


T 


I 


A 


T 


I 


A 


T 


I 


A 


T 


I 


A 


T 


I 




T 


C 




T 


C 


I 


-r 


C 


I 


T 


C 




T 


C 






T 


G 




T 


G 


I 


T 


G 


I 


T 


G 




1 


G 






C 


G 




C 


G 


I 


C 


G 


I 


C 


G 




c 


G 






T 


A 




T 


A 


I 


T 


A 


I 


T 


A 




T 


A 





10 



15 



20 



The affinity of the probes may be further enhanced by the includsion of 
additional intosines, (or 4,- way, 3 -way, or 2- way wobbles, or other generic bases) to the 3' 
and 5' ends of the oligonucleotide probes. These codon usage biased probes can be used in 
conjunction with a ligase discrimination to further increase obtainable sequence 
iiifv/xTr.aticn. Thus, for e'^'^'^p'e, where the hybridization to an array comprising the above- 
described 16 mers also includes a ligation with one or more ligatable oligonucleotides of 
fixed length N, whose sequence is known, each successful ligation provides 16 + N 
nucleotides of sequence information. 



ii) Expression monitoring arrays. 

In a preferred embodiment, oligonucleotide probes in the expression 
monitoring high density array are selected to bind specifically to the nucleic acid target to 

25 which they are directed with minimal non-specific binding or cross-hybridization under the 
particular hybridization conuiuun^ uuli-tCu. Dccausc the high density arrays of this 
inve::t:Oii csTi conT^i^ :n exceb^ *jt 1 ,000.000 different prcbes. it is possible to proving 
every probe of a characteristic length that binds to a particular nucleic acid sequence. 
Tiiua, fui tAmiiplc, the high d?r_«^'*y ^""^^y r^n contain everv possible 20 mer sequence 

30 comoiemeniai y to an IL-2 mR>JA. 
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There, may exist, however, 20 mer subsequences that are not unique to the 
IL-2 mRNA. Probes directed to these subsequences are expected to cross hybridize with 
occurrences of their complementar>' sequence in other regions of the sample gencme 
Similarly, other probes simply may not hybridize effectively under the hybridization 
5 conditions (e g,, due to secondary structure, or interactions with the substrate or other 

probes). Thus, in a preferred embodiment, the probes that show such poor specificity or 
hybridization efficiency are identified and may not be included either in the high density 
array itself {e.g., during fabrication of the array) or in the post-hybridization data analysis. 

In addition, in a preferred embodiment, expression monitoring arrays are 

10 used to identify the presence and expression (transcription) level of genes which are 
several hundred base pairs long or longer. For most applications it would be useful to 
identify the presence, absence, or expression level of several thousand to one htmdred 
thousand genes. Because the number of oligonucleotides per array is limited, in a preferred 
embodiment, it is desired to include only a limited set of probes specific to each gene 

15 whose expression is to be detected. 

a) Hybridization and cross-hybridization data. 

Thus, in one embodiment, this invention provides for a method of 
optimizing a probe set for detection of a particular gene. Generally, this method involves 

20 providing a high density array containing a multiplicity of probes of one or more particular 
length(s) that are complementary to subsequences of the mRNA transcribed by the target 
gene. In one embodiment the high density array may contain every probe of a particular 
length that is complementary to a particular mRNA. The probes of the high density array 
are then hybridized with their target nucleic acid alone and then hybridized v^th a high 

25 complexity, high concentration nucleic acid sample that does not contain the targets 

complementary' to the prober rbus, for example, where the taiget nucleic acid is an RNA, 
the probes are first hybridised with their target nucleic acid aione and irier* hyb»M»'^rAi vvi'ih 
RNA made from a cDNA library (e.g., reverse transcribed poI> A" mRNA) where the sense 
of the hybridized RNA is opposite that of the target nucleic acid (to insure that the high 

30 compleviry cample does not contain targels for ilie piubc:*). These prober: thrit rhciv a 
strong hybndization signal with ihth target and little cr nc cross-hybridization with the 
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high complexity sample are preferred probes for use in the high density arrays of this 
invention. 

The high density array may additionally contain mismatch controls for each 
of the probes to be tested. In a preferred embodiment, Lhe misnriatch controls contain a 

5 central mismatch. Uliere both the mismatch control and the target probe show high levels 
of hybridization (e.g., the hybridization to the mismatch is nearly equal to or greater than 
the hybridization to the corresponding test probe), the test probe is preferably not used in 
the high density array. 

In a particularly preferred embodiment, optimal probes are selected 

10 according to the following method: First, as indicated above, an array is provided 

containing a multiplicity of oligonucleotide probes complementary to subsequences of the 
target nucleic acid. The oligonucleotide probes may be of a single length or may span a 
variety of lengths. The high density array may contain every probe of a particular length 
that is complementary to a particular mRNA or may contain probes selected from various 

15 regions of particular mRMAs. For each target-specific probe the array also contains a 
mismatch control probe; preferably a central mismatch control probe. 

The oligonucleotide array is hybridized to a sample containing target 
nucleic acids having subsequences complementary to the oiigonucieoude probcci aim Ihe 
difference in hybridization intensity between each probe and its mismatch control is 

20 determined. Only those probes where the difference between the probe and its mismatch 
control exceeds a threshold hybridization intensity (e.g. preferably greater than 10% of the 
background signal intensity, more preferably greater than 20% of the background signal 
intensity and most preferably greater than 50% of the background signal intensity) are 
selected. Thus, only probes that show a strong signal compared to their mismatch control 

25 are selected. 

The probe optimization procedure can optionally include a second round of 
selection, in this sekciUtu, the ciigoriucieotide proHr f:rray hybridized with r- nucleic 
acid sampk iliat is not expected to contai" sequences complemeiruiiy io tiie probes. Thu:;, 
for exaiTiplc, where the probes complementary' to the RNA sense strand a sample of 
diuibuisc RNA is provided Of course, other samples could be providea such as sampler 
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from organisms or cell lines known to be lacking a particular gene, or known for not 
expressing a particular gene. 

Only those probes where both the probe and its mismatch control show* 
hybridization intensities below a thieshoid value (e.g. less than about 5 times the 
5 background signal intensit)', preferably equal to or less than about 2 times the background 
signal intensity, more preferably equal to or less tiian about 1 times the background signal 
intensity', and most preferably equal or less than about half background signal intensity) are 
selected. In this way probes that show minimal non-specific binding are selected. Finally, 
in a preferred embodiment, the n probes (where n is the number of probes desired for each 
10 target gene) that pass both selection criteria and have the highest hybridization intensity for 
each target gene are selected for incorporation into the array, or where already present in 
the array, for subsequent data analysis. Of course, one of skill in the art, will appreciate 
that either selection criterion could be used alone for selection of probes. 



15 b) Heuristic rules. 

Using the hybridization and cross-hybridization data obtained as described 
oKrxv** qr?nh<; can be made of hybridization and cross-hybridization intensities versus 
various probe properties e.g.. number of As, number of Cs in a window of 8 bases, 
palindomic strength, etc. The graphs can then be examined for correlations between those 

20 properties and the hybridization or cross-hybridization intensities. Thresholds can be set 
beyond which it looks like hybridization is always poor or cross hybridization is always 
very strong. If any probe fails one of the criteria, it is rejected from the set of probes and 
therefore, not placed on the chip. This will be called the heuristic rules method. 

One set of rules developed for 20 mer probes in this manner is the 

25 following: 

Hybridization niles: 

1) Number of As is f«^*>iM ^. 

2) Number of Ts is less than 1 0 and greater than 0. 

'^) Maximum run of As, Gs, or Ts is less tlian 4 bases in a row. 
30 4) Mj'vimum run of any 2 bases is le:>:> uiai~i 1 1 

5) Pahndrome score is less than 6. 
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6) Clumping score is less than 6. 

7) Number of As * Number of Ts is less than 14 

8) Number of As ^ number of Gs is less than 1 5 

With respect to rule number 4. requiring the maximum run of any two bases to be less than 
5 ] 1 bases guarantees that at least three different bases occur within any ! 2 consecutive 

micleotides. A palindrome score is the maximum number of com.plementarj' bases if the 
oligonucleotide is folded over at a point that maximizes self complementarity. Thus, for 
example a 20 mer that is perfectly self-complementary would have a palindrome score of 
10. A clumping score is the maximum number of three -mers of identical bases in a given 
10 sequence. Thus, for example, a run of 5 identical bases will produce a clumping score of 3 
(bases 1-3, bases 2-4, and bases 3-5). 

If any probe failed one of these criteria (1-8), the probe was not a member 
of the subset of probes placed on the chip. For example, if a hypothetical probe was 5*- 
AGCTTTTTTCATGCATCTAT-3' the probe would not be synthesized on the chip 
15 because it has a run of four or more bases (Le,, run of six). 

The cross hybridization rules developed for 20 mers were as follows: 

1) Number of Cs is less than 8; 

2) Number of Cs in any window of 8 bases is less thm 4. 

Thus, if any probe failed any of either the hybridization ruses (1-8) or the 
20 cross-hybridization rules (1-2), the probe was not a member of the subset of probes placed 
on the chip. These rules eliminated many of the probes that cross hybridized strongly or 
exhibited low hybridization, and perfomied moderate job of eliminating weakly 
hybridizing probes. 

These heuristic rules may be implemented by hand calculations, or 
25 alternatively, they may be implemented in software as is discussed below in Section XII. 

III ariutlicr embodiment, z neural net C2« be to prertirr thp 

hybridization and cress hybridization intensities h»sed on the sequence of the probe or cn 
3G Ouiev piobc properties. Tlie ney^' "Pt can then be used to pick an arbitrarv^ number of the 
"best" probes. One such neiiraJ net was aeveiopea lor seieciing ^v-mcr piOuCS. i ais 
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neural net was produced a moderate (0.7) correlation between predicted intensity and 
measured intensity', with a better model for cross hybridization than hybridization. Details 
of this neural net are provided in Example 6. 



5 d)ANOVA Model 

An analysis of variance (ANOVA) model may be built to model the 
intensities based on positions of consecutive base pairs. This is based on the theory that 
the melting energy is based on stacking energies of consecutive bases. The annova model 
was used to find correlation between the a probe sequence and the hybridization and cross- 
10 hybridization intensities. The inputs were probe sequences broken down into consecutive 
base pairs. One model was made to predict hybridization, another was made to predict 
cross hybridization. The output was the hybridization or crosshybridization intensity. 

There were 304 (19 * 16) possible inputs, consisting of the 14 possible two 
base combinations, and the 19 positions that those combinations could be foimd in. For 
15 example, the sequence aggctga... has "ag" in the first position, "gg" in the second position, 
"gc" in the third, "ct" in the fourth and so on. 

The resulting model assigned a component of the output intensity to each of 
the possible inputs, so to estimate the intensity for a given sequence one sim.ply 
adds the intensities for each of it's 19 components. 

20 

e) Pruning (removal) of similar probes. 
One of the causes of poor signals in expression chips is that genes other 
than the ones being monitored have sequences which are very similar to parts of the 
sequences which are being monitored. The easiest way to solve this is to remove probes 
25 which are similar to more than one gene. Thus, in a preferred embodiment, it is desirable 
lu iciiiove (prune) probes thst hybrirSi^e to transcription producls of more than one gene. 

The simriesi ppj^'^io mtiiJiod is to line up a proposed probe w?i^* f^i* kMOVvTi 
genes for the organism being monitored, then coxmi the numbei of maichirig bases. For 
evr»mn!^ aiven a probe to gene i of an organism and gene 2 of an crganisir. as follows: 

probe frcm eepe 1: aaacacaatcgaLLdLycuC 

I " "1 1 1 1 1 1 1 

gene 2: atctcggatcgatcggataagcgcgatcgattatgctcggcga 
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has 8 matching bases in this alignment, but 20 matching bases in the following alignment: 

a a acgcaa t c ga t: t: a cere t c 

;;Ti!iTii!M!!iiii;i 

atctcggatcqaccggataagcgcgatcgattatgctcggcga 

More complicated algorithms also exist, which ailow the detection of insertion or deletion 
mismatches. Such sequence alignment algorithms are well known to those of skill in the art 
and include, but are not limited to BLAST, or FASTA, or other gene matching programs 
such as those described above in the definitions section. 

In another variant, where an organism has many different genes which are 
very similar, it is difficult to make a probe set that measures the concentration only one of 
those very similar genes. One can then prune out any probes which are dissimilar, and 
make the probe set a probe set for that family of genes. 

f) Synthesis cycle pruning 
The cost of producing masks for a chip is approximately linearly related to 
the number of synthesis cycles. In a norma! set ol genes the distribuiion of tat rnoiTibw*' wf 
cycles any probe takes to build approximates a Gausian distribution. Because of this the 
mask cost can normally be reduced by 15% by throwing out about 3 percent of the probes. 
In a preferred embodiment, synthesis cycle pruning simply involves eliminating (not 
including) those probes those probes that require a greater number of synthesis cycles than 
the maximum number of synthesis cycles selected for preparation of the particular subject 
high density oligonucleotide array. Since the typical synthesis of probes follows a regular 
pattern of bases put down (acgtacgtacgt...) counting the number of synthesis steps needed 
to build a probe is easy. The listing shown in Table 1 povides typical code for counting the 

Table L Typical for countiiig synthesis cycles required for the chemical synthesis of 
a probe. 



probe from gene 1 : 
gene 2; 



stmic char baseij = "acgi"; 

// abcdefghijklmnopqrstuvwxyz 
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Static short indexO = { 0, 0, 1, 0, 0, 0, 2, C, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0) ; 

short lookupIndex( char aBase ){ 

if( isupper( aBase ) [ !isalpha( aBase) ){ 
5 errorH\\nd( ''iiicgal base"), 

return -1; 

} 

if( strchr( base, aBase ) = NULL ){ 
errorHwTidC "non-dna base''); 
10 return 0; 

} 

return index[ aBase - 'a']; 

} 

15 static short calculateMinNumberOfSynthesisStepsForCompieinent( char local * buffer ){ 
short i, last, current, cycles = 1 ; 
char bufferl[40]; 
for( i =3D 0; buffer[i] != 0; i++ ){ 
switch( tolower(buffer[i]) ){ 
20 case 'a': buffer! [i] = 't';break; 

case 'c': buffer l[i] = 'g';break; 
case'g': buffer! [i] = 'c';break; 
case'f: buffer! [i] = 'a';break; 

} 

25 } 

buffer! [i] = 0; 

if( buffer! [0] == 0 ) return 0; 
last = !ookuplndex( buffer! [0] ); 
for(i = !; buffer! [i] !=0;i++){ 
30 current = lookupindex( buffer! [i] ); 

if( current <= last ) cycles++; 

last = current; 

} 

return (short)((cycles -1) ♦ 4 + current +! ); 

35 } 



g) Cc::zbzr,2tion of refection methods. 
"^^e heuristic raies, ne!'*^! aitd anr.ova mcdei provide W(^ys pranirig 
40 or reducing the number of probes for monitoring the expression of geiics. As these 

rriCthods do not r?rf*^*^3rily produce the same resulib, ui produce entirely independent 
iesults, it ir.ay be advantageous to combine the meuiuJs. Tcr c:::unp!e, probes ni?y Hp 
pruned or reduced if more than one method (e.^., two cut of three) inHicate the probe will 
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not likely produce good results. Then, synthesis cycle pruning may be performed to reduce 
costs. 

Fig. 1 1 shows the now of a process of increasing the number of probes for 
monitoring the expression of genes after the number of probes has been njduced or pruned. 
5 In one embodiment, a user is able to specify the number of nucleic acid probes that should 
be placed on the chip to momtor the expression of each gene. As discussed above, it is 
advantageous to reduce probes that will not likely produce good results; however, the 
number of probes may be reduced to substantially less than the desired number of probes. 

At step 402, the number of probes for monitoring multiple genes is reduced 
10 by the heuristic rules method, neural net, annova model, synthesis cycle pruning, or any 
other method, or combination of methods. A gene is selected at step 404. 

A detennination is made whether the remaining probes for monitoring the 
selected gene number greater than 80% (which may be varied or user defined) of the 
desired number of probes. If yes, the computer system proceeds to the next gene at step 
15 408 which will generally return to step 404. 

If the remaining probes for monitoring the selected gene do not number 
greater than 80% of the desired number of probes, a determination is made whether the 
remaining probes for m.onitoring the selected gene number greater than 40% (which may 
be varied or user defined) of the desired number of probes. If yes, an "i" is appended to the 
20 end of the gene name to indicate that after pruning, the probes were incomplete at step 412. 

At step 414, the number of probes is increased by loosening the constraints 
that rejected probes. For example, the thresholds in the heuristic rules may be increased by 
1 . Therefore, if previously probes were rejected if they had four As in a row, the rule may 
be loosened to five As in a row. 
25 A determination is then made whether the remaining probes for monitoring 

the selected gene number greater than 80% of the desired number of probes at step 416 If 
yes, an ' r is appt^n'j.^i^ miC ciiu oi uiw gcn*.. name ut i>*..Ck ^^^i *^ io iiiuiciiie uiiit uie iii:t;^ 
were loo^eii*^u to generate the number cf synLhesized probes for ^h?t genf? 

At step 420, a check is m«dr: see i"^ the probes for monitoring the selected 
3G gene only conflict v/ith one *wo other genes. If yes. the full set of probes 
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complementary' to the gene (or target sequence) are taken and pnined so that the probes 
remaining are exactly complementarj' to the selected gene exclusively at step 422. 

A determination is then made H'hether the remaining probes for monitoring 
the selected gene number greater than 80% of the desired number of probes at step 42^. if 
S yes. an "s" is appended to the end of the gene name at step 426 to indicate that the only a 
few genes were similar to the selected gene. 

At step 428, the probes for monitoring the selected gene are not reduced by 
conflicts at all. A determination is then made whether the remaining probes for monitoring 
the selected gene number greater than 80% of the desired number of probes at step 430. If 
10 yes, an "f is appended to the end of the gene name at step 432 to indicate that the probes 
include the whole family of probes perfectly complementary to the gene. 

If there are still not 80% of the desired number of probes, an error is 
reported at step 434. Any number of error handling procedures may be undertaken. For 
example, an error message may be generated for the user and the probes for the gene may 
15 not be stored. Alternatively, the user may be prompted to enter a new desired number of 
probes. 

V. Synthesis of High Density Arrays 

Methods of forming high density arrays of oligonucleotides, peptides and 
20 other polymer sequences with a minimal number of synthetic steps are known. The 
oligonucleotide analogue array can be synthesized on a solid substrate by a variety of 
methods, including, but not limited to, light-directed chemical coupling, and mechanically 
directed coupling. See Pirrung et al, U.S. Patent No. 5,143,854 (see also PCT Application 
No. WO 90/15070) and Fodor et aL, PCT Publication Nos. WO 92/10092 and WO 
25 93/09668 which disclose methods of forming vast arrays of peptides, oligonucleotides and 
other molecules using, for exdiuHlc, light directed synth^^sis technimies. See also. Fodor e/ 
a: , 5i,::rr:LV, 2"- , 767 77 ; Thc.^e procedures for wntrteMs of polymer arrays are 
now referred to as VLSIPS™ procedures. Using the VLSIPS '^^ approach, one 
heteio^uiious array of pc!yr»^'^ ''^ converted, throueh simultaneous coupling at a jiuiribcr 
30 of reaction site:*, into a different het^roo^nnns ^iray. bee, U.S. AppIicaLiuii Scilal Nos. 
07/796,243 and 07/980,523. 
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The development of VLSIPS™ technology as described in the above-noted 
U.S. Patent No. 5,143,854 and PCT patent publication Nos. WO 90/15070 and 92/10092, 
is considered pioneering technology in the fields of combinatorial synthesis and screening 
of combinatorial libraries. More recently, patent application Serial No. 08/082,937, filed 
5 June 25, 1993 describes methods for making arrays of oligonucleotide probes that can be 
used to check or determine a partial or complete sequence of a target nucleic acid and to 
detect the presence of a nucleic acid containing a specific oligonucleotide sequence. 

In brief, the light-directed combinatorial synthesis of oligonucleotide arrays 
on a glass surface proceeds using automated phosphoramidite chemistry and chip masking 
10 techniques. In one specific implementation, a glass surface is derivatized with a siiane 
reagent containing a functional group, e.g., a hydroxyl or amine group blocked by a 
photolabile protecting group. Photolysis through a photolithogaphic mask is used 
selectively to expose functional groups which are then ready to react with incoming 
5'-phoioprotected nucleoside phosphoramidites. The phosphoramidites react only with 
15 those sites which are illuminated (and thus exposed by removal of the photolabile blocking 
group). Thus, the phosphoramidites only add to those areas selectively exposed from the 
preceding step. These steps are repeated until the desired array of sequences have been 
synthesized on the solid surface. Combinatorial synthesis of different oligonucleotide 
analogues at different locations on the array is determined by the pattern of illumination 
20 during synthesis and the order of addition of coupling reagents. 

In the event that an oligonucleotide analogue with a polyamide backbone is 
used in the VLSIPS™ procedure, it is generally inappropriate to use phosphoramidite 
chemistry to perform the synthetic steps, since the monomers do not attach to one another 
via a phosphate linkage. Instead, peptide synthetic methods are substituted. See, e.g., 
25 Pirrung et al U.S. Pat. No. 5,143,854. 

Peptide imcleic acids are commercially available from, e.g., Biosearch, Inc. 
(Bedfovd. MA 5 which comprise ^ po: v^trMidc backbone ui:u the bases fo-^d r\^^^^r^\y 
occumng nucleosides. Feptidc nucleic acids are capable o^^ b?"C[»na to nucieic acids witii 
high specificity, and arc considered ''oiigonucieotidt^ analogues" for purposes of this 
30 disclosure. 
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In addition to the foregoing, additional methods which can be used to 
generate an array of oligonucleotides on a single substrate are described in co-pending 
Applications Sen No. 07/980,523. filed November 20, 1992, and 07/796,243. filed 
November 22, 1991 and in PCT Publication No. WO 93/09668. In the methods disclosed 
5 in these applications, reagents are delivered to the substrate by either (I) flowing within a 
channel defined on predefined regions or (2) "spotting" on predefined regions. However, 
other approaches, as well as combinations of spotting and flowing, may be employed. In 
each instance, certain activated regions of the substrate are mechanically separated from 
other regions when the monomer solutions are delivered to the various reaction sites. 

10 A typical "flow channer' method applied to the compounds and libraries of 

the present invention can generally be described as follows. Diverse polymer sequences 
are synthesized at selected regions of a substrate or solid support by forming flow channels 
on a surface of the substrate through which appropriate reagents flow or in which 
appropriate reagents are placed. For example, assume a monomer "A" is to be bound to 

15 the substrate in a first group of selected regions. If necessary, all or part of the surface of 
the substrate in all or a part of the selected regions is activated for binding by, for example, 
flowing appropriate reagents through all or some of the charmels, or by washing the entire 
substrate with appropriate reagents. Afler placement of a channel block on the surface of 
the substrate, a reagent having the monomer A flows through or is placed in all or some of 

20 the charmel(s). The chamiels provide fluid contact to the first selected regions, thereby 
binding the monomer A on the substrate directly or indirectly (via a spacer) in the first 
selected regions. 

Thereafter, a monomer B is coupled to second selected regions, some of 
which may be included among the first selected regions. The second selected regions will 

25 be in fluid contact with a second flow channel(s) through translation, rotation, or 
icplaccmcnt of the ch2rj:!e! b'orV: on the surface of the substrate; through opemng 
or clr.?;:"£ a nciected vzive; innn*^!* ucoosition of a Saver of chemical or photoresisi, li 
necessary, a step is performed for activating at least the second regions. Tneieauei, tlic 
rnnnnmer B is flowed through or placed in the second flow charuicl(s), binding monomer B 

30 at the seco^^^ <:plected locations, in this particular exaiiiplc, the resulting sequences bound 
to the substrate at this stage of proce^slug will be, for example. A, B, and A.B. The process 
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is repeated lo form a vast array of sequences of desired length at known locations on the 
substrate. 

After the substrate is activated, monomer A can be flowed through some of 
the channels, monomer B can be flowed through other channels, a monomer C can be 
flowed through still other channels, etc. In this manner, many or all of the reaction regions 
are reacted with a monom.er befoie the channel block must be moved or the substrate must 
be washed and/or reactivated. By making use of many or all of the available reaction 
regions simultaneously, the number of washing and activation steps can be minimized. 

One of skill in the art will recognize that there are alternative methods of 
forming channels or otherwise protecting a portion of the surface of the substrate. For 
example, according to some embodiments, a protective coating such as a hydrophilic or 
hydrophobic coating (depending upon the nature of the solvent) is utilized over portions of 
the substrate to be protected, sometimes in combination with materials that facilitate 
wetting by the reactant solution in other regions. In this manner, the flowing solutions are 
further prevented from passing outside of their designated flow paths. 

According to other embodiments the channels will be formed by depositing 
an electron or photoresist such as those used extensively in the semiconductor industrj'. 
Such materials include poiymethyl methacrylate (PMMA) and its derivatives, and electron 
beam resists such as poly(olefin sulfones) and the like (more fully described in Chapter 10 
of Ghandi, VLSI Fabrication Principles, Wiley (1983)). According to these embodiments, 
a resist is deposited, selectively exposed, and etched, leaving a portion of the substrate 
exposed for coupling. These steps of depositing resist, selectively removing resist and 
monomer coupling are repeated to form polymers of desired sequence at desired locations. 

The ^'spotting" methods of preparing compounds and libraries of the present 
invention can be implemented in much the same maimer as the flow channel methods. For 
exainple, a monomer A, or a coupled, or dimer, or trimmer, or tetramer, etc, or a fully 
sjTitheized mdltivdi, c>?!> i>e cieirvercd lo and ccupicd with a Srsl group reac;:ori re^:o;ib 
wiiicli liave been apprcpriately activated. Thereafter, a mcncmer B can be delivered to ard 
reacted Vv»ith a second group of activated reartion replons. Unlike tlie flow charuiel 
embodiments described above, rfart;^nt<; delivered by directly depositing Crather than 
flcv.^lng) relatively srmW quantities of them in selected regions. In some sieps, of course, 
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the eniire substrate surface can be sprayed or othervvise coated with a solution. In preferred 
embodiments, a dispenser moves from region to region, depositing only as much monomer 

as necessary at each stop. Typical dispu-nsers include a micropipette to deliver the 
monomer solution to the substrate and a robotic system to control the position of the 
micropipette with respect to the substrate. In other embodiments, the dispenser includes a 
series of rubes, a manifold, an array of pipettes, or the like so that various reagents can be 
delivered to the reaction regions simultaneously. 



VL Hybridization. 

jQ Nucleic acid hybridization simply involves providing a denatured probe and 

target nucleic acid under conditions where the probe and its complementary target can 
form stable hybrid duplexes through complementary base pairing. The nucleic acids that 
do not form hybrid duplexes are then washed away leaving the hybridized nucleic acids to 
be detected, typically through detection of an attached detectable label. It is generally 
15 recognized that nucleic acids are denatured by increasing the temperature or decreasing the 
salt concentration of the buffer containing the nucleic acids, or in the addition of chemical 

-<,;;^„ ,Nf th*. nH Under low stringency conditions (e.g., low temperature 

and/or high salt and/or high target concentration) hybrid duplexes (e.g., DNA:DNA, 
RNA:RNA, or KNA:DNA) will form even where the annealed sequences are not perfectly 
20 complementary. Thus specificity of hybridization is reduced at lower stringency. 
Conversely, at higher stringency (e.g., higher temperature or lower salt) successful 
hybridization requires fewer mismatches. 

One of skill in the art vnll appreciate that hybridization conditions may be 
selected to provide any degree of stringency. In a preferred embodiment, hybridization is 
25 performed at low stringency in this case in 6X SSPE-T at about 40°C to about 50X 

(0.005% Triton X-1()()J to ensme liy'u.idizaticn and ther si-hsenuent washes are performed 
at higher stringency (e s . '• '< SSFE-T ,t 37"C) to eliminate mi.smatchcd hybrid dupks^s. 
Successive washes may be performed at increasingly higher stringency (e g, down to as 
low as 0 25 X SSrE-T at 37°C to '^nT) until a desired level of hybriuization specificity ss 
obtained. Stringency can also be incre?«ed hy addition of ageuu> bu^ii aS fonr.a.-ids. 
Hybridization specificity may be evaluated by comparison of hybridization to the test 
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probes with hybridization to the various controls that can be present {e.g.. expression level 
control, normalization control, mismatch controls, eic). 

In general, there is a tradeoff between hybridization specificity (stringency) 
and signal intensity. Thus, in a preferred embodiment, llie wash is performed at the highest 
5 stringency that produces consistent results and that provides a signal intensity greater than 
approximately 10% of the background intensity. Thus, in a preferred embodiment, the 
hybridized array may be washed at successively higher stringency solutions and read 
between each wash. Analysis of the data sets thus produced will reveal a wash stringency 
above which the hybridization pattern is not appreciably altered and which provides 

10 adequate signal for the particular oligonucleotide probes of interest. 

In a preferred embodiment, background signal is reduced by the use of a 
detergent (e.g., C-TAB) or a blocking reagent {e.g., sperm DNA, cot-1 DNA, etc.) during 
the hybridization to reduce non-specific binding. In a particularly preferred embodiment, 
the hybridization is performed in the presence of about 0. 1 to about 0.5 mg/ml DNA {e.g., 

15 herring sperm DNA). The use of blocking agents in hybridization is well known to those 
of skill in the art {see, e.g.. Chapter 8 in P. Tijssen, supra.) 

The stability of duplexes formed between RNAs or DNAs are generally in 
the order of RNA:RNA > RN A:DN A > DN A:DNA, m solution, juong probes have beiier 
duplex stability with a target, but poorer mismatch discrimination than shorter probes 

20 (mismatch discrimination refers to the measured hybridization signal ratio between a 
perfect match probe and a single base mismatch probe). Shorter probes {e.g., 8-mers) 
discriminate mismatches very well, but the overall duplex stability is low. 

Altering the thermal stability (T^,) of the duplex formed between the target 
and the probe using, e.g., known oligonucleotide analogues allows for optimization of 

25 duplex stability and mismatch discrimination. One useful aspect of altering the T^ arises 
from, the fact that adenine-thymine (A-T) duplexes have a lower T^ than guanine-cytosine 
(G-C) dupit;xei>, due par- to the fe^ tb^t tn^ A T diip!e;xeb hiive 2 hydrogen bonds per 
base p:iir, while the G-C duplexes have 3 hydrogen bonds per ba^ie pan. in licicrogcucous 
oligopiJcleotide arrays in which there is a ncn uniform distribution of bases, it is not 

30 gf nerriJly possible to optimize hybridization for each oligonucleotide proDe 

simuiianeousiy. Tnus, in some embodiments, it is desirable to selectively destabilize G-C 
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duplexes and/or to increase the siabilit>' of A-T duplexes. This can be accomplished, e.g., 
by substituting guanine residues in the probes of an array which form G-C duplexes with 
hypoxanthine, or by substituting adenine residues in probes which form A-T duplexes with 
2,6 diaminopurinc or by using the salt ietraiTiethyl ammonium chioride (TMACl or other 
5 aihylaled ammonium salts) in place of NaCl. 

Altered duplex stability conferred by using oligonucleotide analogue probes 
can be ascertained by following, e.g., fluorescence signal intensity of oligonucleotide 
analogue arrays hybridized with a target oligonucleotide over time. The data allow 
optimization of specific hybridization conditions at, e.g.. room temperature (for simplified 
10 diagnostic applications in the future). 

Another way of verifying altered duplex stability is by following the signal 
intensity generated upon hybridization with time. Previous experiments using DNA targets 
and DNA chips have shown that signal intensity increases with time, and that the more 
stable duplexes generate higher signal intensities faster than less stable duplexes. The 
1 5 signals reach a plateau or "saturate" after a certain amount of time due to all of the binding 
sites becoming occupied. These data allow for optimization of hybridization, and 
determination of the best conditions at a specified temperature. 

Methods of optimizing hybridization conditions are well known to those of 
skill in the art (see. e.g.. Laboratory Techniques in Biochemistry and Molecular Biology. 
20 Vol. 24: Hybridization With Nucleic Acid Probes, P. Tijssen, ed. Elsevier, N. Y., (1 993)). 



VII. Detection Methods 

Methods for detection depend upon the label selected and are knovm to 
those of skill in the art. Thus, for example, where a colorimetric label is used, simple 
25 visualization of the label is sufficient. Where a radioactive labeled probe is used, detection 
of the radiation (e.g with photographic film or a solid state detector) is sufficient. 

» . 5,,.-,iair.e'J above, the use of a fluorescent label is preferred because *.t its 
extreme sensitivity and simplicity. Standard procedures are used to detemiine the 
posiiioiis vviicrc interactions iv^nveen a target sequence and a leagent take place, her 
example, if a laiKCt sequence is labeled and exposed to an may of different 
oligonucleotide probes, only those locations where the oligonucleotides interact with the 
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target (sample nucleic acid(s)) will exhibit significant signal. In addition to using a labe], 
other methods may be used to scan the matrix to determine where interaction lakes place. 
The spectrum of interactions can, of course, be determined in a temporal manner by 
repeated scans of interactions which occur at each of a multiplicity of conditions. 
5 However, instead of testing each individual interaction separately, a multiplicity of 
sequence interactions may be simultaneously determined on a matrix. 

B* Scanning System 

In a preferred embodiment, the hybridized array is excited with a light 

10 source at the excitation wavelength of the particular fluorescent label and the resulting 
fluorescence at the emission wavelength is detected. In a particularly preferred 
embodiment, the excitation light source is a laser appropriate for the excitation of the 
fluorescent label. 

Detection of the fluorescence signal preferably utilizes a confocal 

15 microscope, more preferably a confocal microscope automated with a computer-controlled 
stage to automatically scan the entire high density array. The microscope may be equipped 
with a phototransducer (e.^., a photomultiplier, a solid state array, a ccd camera, etc.) 
attached to an automated data acquisition system to automatjcally record the fluorescence 
signal produced by hybridization to each oligonucleotide probe on the array. Such 

20 automated systems are described at length in U.S. Patent No: 5,143,854, PCT Application 
20 92/10092, and copending U.S.S.N. 08/195,889 filed on February 10, 1994. Use of laser 
illumination in conjimction with automated confocal microscopy for signal detection 
permits detection at a resolution of better than about 100 ^m, more preferably better than 
about 50 ^m, and most preferably better than about 25 ^m. 

25 With the automated detection apparatus, the correlation of specific 

positional labeling is converted to the presence on the target of sequences for which the 
i>M£:onuCCiCiiGC3 iiavc spcciiicity ^iilwiwuliCii. Ihus, the p'C'S-t'OiiHl •iiforn'?,t!orr 
directly converted to a database indicator's w^^t sf^quence interactions have occurred. For 
example, in a nucleic f-^cid hybridizatioii application, the sequences which have interacted 

30 betwep" tVip ciih<rfr;ate matrix and the target molecule can be directly listed from the 

positional information. A preferred uciecuon syMern is described in PCT publication no. 
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WO90/I5070; and U.S. S.N. 07/624,120. Although the detection described therein is a 
fluorescence detector, the detector can be replaced by a spectroscopic or other detector. 
The scanning system can make use of a moving detector relative to a fixed substrate, a 
fixed detector with a moving substrate, or a combination. Alternatively, mirrors cr other 
5 apparatus can be used to transfer the signal directly to the detector. See, e.g , IJ. S.S.N. 
07/624,120. 

The detection method will typically also incorporate some signal processing 
to determine whether the signal at a particular matrix position is a true positive or may be a 
spurious signal. For example, a signal from a region which has actual positive signal may 

10 tend to spread over and provide a positive signal in an adjacent region which actually 
should not have one. This may occur, e.g., where the scanning system is not properly 
discriminating with sufficiently high resolution in its pixel density to separate the two 
regions. Thus, the signal over the spatial region may be evaluated pixel by pixel to 
determine the locations and the actual extent of positive signal. A true positive signal 

15 should, in theory, show a uniform signal at each pixel location. Thus, processing by 

plotting number of pixels with actual signal intensity should have a clearly uniform signal 
intensitv. Regions where the signal intensities show a fairly wide dispersion, may be 
particularly suspect and the scanning system may be programmed to more carefully scan 
those positions. 

20 More sophisticated signal processing techniques can be applied to the initial 

determination of whether a positive signal exists or not. See, e.g., U.S.S.N. 07/624,120 
and discussion below in Section XII. 

VIIL Ligation-Enhanced Signal Detection. 
25 A) General Ligation Reaction. 

Ligation teaaions can be used to discriminate between fully eomplemcntar>' 
u,ru*.;.u ♦■►^o^ A\^^. hv one cr mere base oaiticalarlv m cases where ^im 

mismatch is near the 5' temiinus of the probe oligonucleotide. Use of a iigatlou reaction m 
sigiidl detection incre^^es ^y^^ stahilitv of the hvbnd duplex, impnuves hybridization 
30 5ipecificity (pai ticularly for shorten p^nhe oiigonucieotideb t.^., 5 1 2 mcrs), and 
optionally, provides additional sequence information. 
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Various components for use of ligation reaclion(s) in combination with 
generic difference arrays are illustrated in Figure 13a. In its simplest embodiment, the 
probe oligonucleotide/ligation reaction system includes an array of olignucleotide probes. 
As discussed above, the oiigonculcieotide probes can be randomly selected, haphazardly 
5 selected, composition biased, inclusive of all possible oligonucleotides of a particular 

length, and so forth. The oligonucleotide probes can optionally include a predetemiined 
"constant" region {see Fig. 13a) v^hich has substantially the same sequence for substantially 
all of the probe oligonucleotides on the array. 

Where the probe comprises a constant region it also preferably comprises a 

10 "variable region" (see Fig. 13a) which can be randomly selected, haphazardly selected, 

composition biased, inclusive of all possible oligonucleotides of a particular length, and so 
forth. When constant and variable regions are present, a sample nucleic acid that 
hybridizes to the oligonucleotide probe typically hybridizes to at least the variable region 
and optionally to the constant region as well. 

15 The probe oligonucleotide/ligation reaction system also optionally includes 

a nucleic acid that is complementary to the constant region. This complement may be a 
subsequence of a sample nucleic acid or a separate oligonucleotide. When the complement 
to the constant region is a separate oligonucleotide, hybridization to the constant region 
provides a ligation site (see Fig. 13a, ligation site A). The hybridized complement to the 

20 constant region can optionally be permanently crosslinked to the constant region by the use 
of cross-linking reagents (e.g., psoralens). The sample nucleic acid, and/or the ligatable 
oligonucleotide can optionally be labeled. Where both are labeled, the labels can be the 
same or distinguishable. 

The probe oligonucleotide/ligation reaction system optionally includes a 

25 ligatable oligonoucleotide that can be ligated to free terminus of the variable region (see 
Fig. i3a, ligation site B). The ligatable oligonucleotide can be a single oligonculeotide of 
known nucleotide sequeiu t^., a coiicciion of nucicic <.^....^.i^ o ^ 



aii possible ougonculeotidcs cf a particular length. 

These various components of the probe oiigonucleoLide/Iigatioai reaction 
systciTi can be combined in a variety of ways to increases the stabilitv of the hvbrid duplex, 
and^or improve hybridization snecitlcity (j^aniculariv for Miuuei piobe uligoiiuLleotides 
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e.g., 5 to 12 mers), and/or provides sequence information. Various uses of the probe 
oiigonucieotide/ligation reaction system are described in detail below. 

>\'liile Figiire 13a illustrates ligation components in solid phase, similar 
approaches and components can be used in solution phase. It will be appreciated that tht- 

5 order of the constant region and variable region can be altered. In addition, a probe 

oligonucleotide may comprise multiple constant regions and'or multiple vai iable regions. 
In addition, while Fig. 1 3a illustrates the probe oligonucleotide attached to a solid support 
by a 3' terminus, the probe can also be reversed and attached via the 5' terminus. 

It will be appreciated that sequences or subsequences of the probe 

10 oligonucleotide where variable regions are present or absent can act as a primer site for 
initiation of polymerization using the remainder of the probe oligonucleotide and/or the 
ligation oligonucleotide and/or the sample nucleic acid as a polymerization template. 

B) Ligation Reactions to Discriminate Mismatches at Probe Termini, Target 

15 Termini, or Both Termini 

In one embodiment, a simple ligation reaction discriminated mismatches at 
or near the terminus of the probe oligonculeotide (see Fig. 13b). Typically, the nucleic 
acid fragments comprising the sample nucleic acid are longer than the probe 
oligonucleotides in the array. So that, when hybridized, the target nucleic acid typically has 

20 an overhang. When the array comprises probe oligonucleotides attached through their 3' 
termini, the hybridized target (sample) nucleic acid provides a 3' overhang. In this 
embodiment, the target nucleic acid is not necessarily labelled (see, e.g.. Fig. 13b). 

When the array of oligonucleotides is combined with the target nucleic acid 
to form target-oligonucleotide hybrid complexes, the target-oligonucleotide hybrid 

25 complexes are contacted with a ligase and a labelled, ligatable oligonucleotide or, 

aiteniau v».iy, v. ith a pool of , Hgatabie probes. While the hybridization of the 

vov^wiV ^.;;r-l^;c ac:ds and thr »i^aial)ie probes can be periorrr'ec? se^jucntiaiiy. in a prtfeiiCv 
embodiment both hybridization and ligation are p^ifuiaicd siniultancously (Le., ±t t^Tgft., 
ligit^tb'*' AUfTonucleotide, and iigaie are ail auded together). The pool may romprise 

30 particular pres^'^^^^^ted probes or may ue a cullwwticn of al! possible ^Tnhe<^ of ^ particular 
length (e.g, 3 mer up to 12 niev) (see, e.g., Fig. 13b). 
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The ligation reaction of the labelled, ligatable probes to the phosphor>'laled 
5' end of the oligonucleotide probes on the substrate will occur, in the presence of the 
ligase. predominantly when the target:oIigonuc!eotide hybrid has formed with correct 
base-pairing near the 5' end of the oligcr/acleobde probe and where there js a suitable 3' 
5 overhang of the target nucleic acid to ser\'e as a template for hybridization ^d ligation (sec 
Fig. 12). After the ligation reaction, tiie substrate is washed (multiple times if necessary) 
under conditions suitable to remove the target nucleic acid and the labeled, unligated 
probes (e.g., above 40°C to 50°C, or under otherwise highly stringent conditions). 

Thereafter, a fluorescence image (e.g., a quantitative fluorescent image) of 

10 the hybridization pattern is obtained as described above in Section Vn(B). Labeled 

oligonucleotide probes, Le., the oligonucleotide probes which are complementary to the 
target nucleic acid, are identified. The presence, absence, and/or intensity of the 
hybridization signal provides information regarding the presence and level of the nucleic 
acid sequence or subsequence in the nucleic acid sample as described above. 

15 Any enzyme that catalyzes the formation of a phosphodiester bond at the 

site of a single-stranded break in duplex DNA can be used to enhance discrimination 
between fiiUy complementary hybrids and those that differ by one or more base pairs. 
Such ligases include, but are not limited to, 14 DiSA iigase, iigases isoldteu Som E. Cxjll 
and ligases isolated from other bacteria and bacteriophages. The concentration of the 

20 ligase will vary depending on the particular ligase used, the concentration of target and 

buffer conditions, but will typically range ft*om about 50 units/ml to about 5,000 units/ml. 
Moreover, the time in which the array of target:oligonucleotide hybridization complexes is 
in contact with the ligase will vary. Typically, the ligase treatment is carried out for a 
period of time ranging from minutes to hours. Methods of performing ligase 

25 discrimination can be found in coj)ending USSN 08/533,582, filed on October 18, 1995 
and in Jackson et al (1996) Nature Biotechnology, 14: 1685-1691 . 

it Vr'iii appreciated that t-c Tntjthod described above pnmfjriK' 
uescriminates mismatches at or .near the y terminus oftiie biUidue bound probe 
oligonucleotide does little to discriminate mismatches at, or near, the 5' terminus of the 

30 target (sanipip) nucleic acid (see Fig. 13b). 
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In another embodiment, a ligation can be used to discriminate mismatches 
at, or near, the end of the sample nucleic acid (Fig. 1 3c). In this instance, the probe 
oligonucleotides comprise a constant region and a variable region {e.g., the variable 
regions can include all possible 8 mers as illustrated in Fig. 13c). A constant 
5 oligonucleotide (complementar>' to the constant region or a subsequence thereof) is 

hvbridized to the constant tegloFi and cross -linked (e.g.. covalentlv bound) at that location. 
The remainder of the probe oligonucleotide (e.g., the variable region or subsequences 
thereof and optionally a subsequence of the constant region) forms a 5' overhang to which 
the nucleic acid sample can hybridize. Where there are no mismatches at or near the 
10 terminus of the sample oligonucleotide, a ligation event then joins the sample 

oligonucleotide to the constant oligonucleotide. Free nucleic acids are washed away 
leaving bound hybridized sample oligonucleotides which can then be detected. 

In still another embodiment, , a double ligation (illustrated in Fig. 13d) can 
be used to discriminate mismatches at or near the ends of both the probe oligonucleotide 
15 and the target nucleic acid. In this approach, the probe oligonucleotides each comprise a 
constant region and a variable region as described above in VIII(A). The surface bound 
oligonucleotide probes are hybridized to a constant oligonucleotide having a sequence 
which is complementary to the constant region of the oligonucleotide probes. The sample 
(target) nucleic acids are contacted to the hybrid duplex in the presence of a ligase. Where 
20 there is no terminal mismatch between the sample nucleic acid and the variable region, the 
ligation is successful resulting in the ligation of the constant oligonucleotide to the sample 
nucleic acid {see "first ligation" in Fig. 13d). This ligation thus discriminates mismatches 
at the terminus of the sample nucleic acid. 

The hybridized duplex is contacted with a pool of labeled ligatable 
25 oligonucleotides. Where a ligatable probe is complementary to the overhange produced by 
the hyuiidizcd sample nucleir ?»r.id and there are no mismatches at or near the free terminus 
of iht vnil?>-o rceicn of the pnmt^ oliHorraclcctids a second isgyiMr.^ vyiii attach xrx labeled 
ligatable probe (see Fig. 13d). The second ligation thus dlsciiminatcs against misrr.atcHes 
th- fr^^ terminus of the probe oiigonucleulide. It will be appreciated th^it the various 
30 hybridization ar^^ ligation reactions may ucuiicd out scquenti^iy or cim;,h.".neously. and 
in a preferred embodiment are caiiicd cut simultaneously. 
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As with the previously described method, any eimme that catalyzes the 
formation of a phosphodiester bond at the site of a single-strand break in duplex DNA can 
be used to enhance discrimination between fully complementary hybrids and those that 
differ by one or more base pairs. Such iigases include, but are not limited to, T4 DNA 
5 ligase, ligases isolated from E. coli and ligases isolated from other bacteria or 

bacteriophages. The concentration of the iigase v\in \2xy depending on the pariicular 
ligase used, the concentration of target and buffer conditions, but will typically range from 
about 50 units/ml to about 5,000 units/ml. Moreover, the time in which the array of target 
oligonucleotide:oligonucleotide probe hybrid complexes is in contact with the ligase will 

10 var>'. Typically, the ligase treatment is carried out for a period of time ranging from from 
minutes to hours. In addition, it will be readily apparent to those of skill that the two 
ligation reactions can either be done sequentially or, alternatively, simultaneously in a 
single reaction mix that contains: target oligonucleotides; constant oligonucleotides; a 
pool of labeled, iigatable probes; and a ligase. 

15 In this dual ligation method, the first ligation reaction generally occurs only 

if the 5' end of the target oligonucleotide (/.e., the last 3-4 bases) matches the variable 
region of the oligonucleotide probe. Similarly, the second ligation reaction, which adds a 
label to the probe, generally occurs efficiently only if the first ligation reaction was 
successful and if the ligated target is complementary to the 5' end of the probe. Thus, this 

20 method provides for specificity at both ends of the variable region. Moreover, this method 
is advantageous in that it allows a shorter variable probe region to be used; increases 
probertarget specificity and removes the necessity of labeling the target. Dual ligation 
methods of this sort are described in detail in copending USSN 08/533,582, filed on 
October 18, 1995. 

25 In another embodiment, after hybridization of the nucleotide 

coniplementar>' to the constant region of the probe oHgonculeotides, the hybrid duplex 
formed thereby C'ds\ he riermancntiy crcs^ iiniceu a^i to prevent svbsrqv-^r:* d^'^""t;:''!t:or; 
of the hybrid duplex. W'hcn the sample nucleic 2cid hgated to the overhang thus formed 
it is also permanently attached to the solid sappoit. In this embodiment, the use of a 

30 Iigatable cHgcnucleotide is offiori^'l The sample nucleic acid may itself be labeled thereby 
permitting detection ot the ligated sample nucleic aciub. 
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Methods for cross-linking nucleic acids are well knox^ii to those of skill in 
the art. Such methods include, but are not limited to, baking, exposure to UV, exposure to 
ioniTing radiation, and contact with chemical cross-linking reagents. In a particuJaro 
preferred embodiment, cross-iinking is accomplished by the formation of covaiem bonds 

5 with chemical cross-linking reagents. Preferred cross-iinking reagents include bifunctional 
cross-linking reagents and cross-linking is accomplished by chemical or photoaciivation of 
the cross-iinking reagent with the nucleic acids. The reagents may be applied after 
formation hybrid duplexes, but in a preferred embodiment, the cross-linker is initially 
attached to either the probe or complementar>' (to the constant region) nucleic acids before 

10 hybridization. 

The cross-linking reagent can be any bifunctional molecule which 
covalently cross-links the tester nucleic acid to a hybridized driver nucleic acid. Generally 
the cross-linking agent will be a bifunctional photoreagent which will be monoadducted to 
the tester or driver nucleic acids leaving a second photochemically reactive residue which 

15 can bind covalently to the corresponding hybridized nucleic acid upon photoexcitation. 

The cross-linking molecule may also be a mixed chemical and photochemical bifunctional 
„ , ,^r;n Uf> r^on.nhntnchemicallv bound to the probe or tester nucleic acids via a 

chemical reaction such as alkylation, condensation, or addhion, followed by photochemical 
binding to the corresponding hybridized nucleic acid. Bifimctional chemical cross-linking 

20 molecules activated either catalytically or by high temperature following hybridization 

may also be employed. 

Examples of bifimctional photoreagents include furocoumarins, 

benzodipyrones, and bis azides such as bis-azido ethidium bromide. Examples of mixed 

bifunctional reagents with both chemical and photochemical binding moieties include 
25 haloalkyl-furocoumarins, haloalkyl benzodipyrones, haloalkyl-courmarins and various 

?z^"0 nucleoside tr?n^n*;nhates. 

raiiicuiariy prcisn^u -.i ;nc:'^ae — ^- 

such as g-mcthoxypsoralin, 5-methoxypsoralin and 4, 5', 8-trimethylpsoralin, and the like. 

Other suitable cross-linkers include cib-'atiizx<j;pj.or.c and tr^r.E-benzor'iryrnnf The 
30 cross-iinkei kr.o-w» ccnunercially as sorlon is also buiiab'.c. For a detailed descriptiop of 

the cross-linking of hybridized nucleic acids see WO 85/02628. 
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The foregoing enhancement discrimination methods involving the use of 
ligation reactions can be used in all instances where improved discrimination between ftiiiy 
complementary hybrids and those that differ by one or more base pairs would be helpful. 
More pmicuiariy, such methods can bt; used to more accurately determine the seqt^snci^ 
5 (e.g., de novo sequencing), monitor expression, monitor mutations, or resequence Lhe target 
nucleic acid (/.e., such methods can be used in conjunction witli a second sequencmg 
procedure to provide independent verification). The foregoing is intended to illustrate, and 
not restrict, the way in which an array of target:oligonucleotide hybrid complexes can be 
treated with a ligase and a pool of labeled, iigatable probes to improve hybridization 
10 signals on high density oligonucleotide arrays. 

B) Ligation Reaction to Add Sequence Information. 

i) Extended sequence information from simple ligation. 
The ligation reactions described above can also be used to increase the 
15 sequence information obtained regarding the hybridized nucleic acid. It will be 

appreciated that the nucleotide sequence of each probe oligonucleotide on the high density 
oligonucleotide array is known. Specific hybridization to a sample nucleic acid indicates 
that the hybridized sample nucleic acid has a sequence or subsequence complementary to 
the hybridized probe oligonucleotide. Thus a hybridization event provides sequence 
20 information that can be used to identify the nucleic acids {e.g., gene transcripts) present in 
the hybridized sample. Generally speaking, the sequence information obtained is governed 
by the length of the probe oligonucleotide. Thus, where the probe oligonucleotide is an 8 
mer, 8 nucleotides of sequence information is obtained. 

However, the ligation discrimination reactions described above can be used 
25 to provide additional sequence information. In this embodiment, rather than every possible 
Iigatable oligonucleotide of a given length, the array and sample nucleic acids are 
hybridized ;c prcdiiiei;:i:::ed li^atabls eIigori*:icicc-idc5 i" vv'-"^r^ th^ rxy,^}^rA\d^^s r-? v.r 
mere positions are knoiiT.. Success.tii" hy bnatzation and ligation of the label 
oligonucleotide thus indicates that the hybridized sample nucleic acid has nucleotides 
30 romnlennent;iry to the Iigatable oliconucleotide in addition to the probe oligonucleotide. 
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Thus, for example, where the probe oligonucieotide is an 8 mer and specific 
6 mer ligaiable probes are used, the resulting hybridization will provide 14 nucleotides 
worth of sequence information. 

Where different ligaiable oligonucleotides are used in this context, ii is 
desirable to distinguish between the various ligated oligonucleotides. This can be 
accorr.phshed by sequential ligations with each different species of ligatabie probe 
followed by reading of the array. Alternatively, each species of ligatabie oligonucleotide 
can be labeled with a different detection label allowing simultaneous ligation and 
subsequent detection of the various different labels. 



ii) Use of a generic ligation GeneChipfor interrogating sequences 
adjacent to restriction sites in a complex (target) sample nucleic acid. 
The generic difference arrrays can be used to fingerprint complex DNA 
clones or to monitor the complex pattern of gene expression from a given source. In 
15 fingerprinting a nucleic acid sequence (e.g. an 8 bp sequence) adjacent to a given 
restriction enzyme site is sequenced. 

In fingerprinting, a restriction enzyme is used which cleaves the target at a 
frequency dependent on the length of the recognition sequence. The restriction digest thus 
generate nucleic acid fragments approximately uniformly distributed along the genomic 
20 DNA. For instance, a 4-cutter like Hsp92 II would cut a target about once every several 
hundered basepairs, whereas a 6-cutter, like Sad would cut a target about once every 
several thousand (4,000) basepairs. With restriction enzyme fragments, the individual 
fragments are typically non-overlapping and average several thousand basepairs in length. 
For the purposes of fingerprinting, with a 6-cutter restriction enzyme it is possible to 
25 examine (2000-3000 fragments X 4000 bases/fragment = 8-12 miUion basepairs per target. 

; that it is no^sible to routinely sort an 8-12 million basepair target in a high 



til 



Ci.^itv array to measure f^xtuession uitfcrences or »u MuiMuor gene cxpretii;iO;i \see, e.g 



Fig. 14c) thereby providing a characteristic expie^:>iOii "fingerprint" or abur.d^^ce 
^iff^rpnce fingerprint tor each restriciiun digest cf the sample nucieic acid. Tne 
fingerprinting tnethods tftus provide hicmo to subsample 3 r^urieic f^riri nnnulation in a 
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roughly uniform and reproducible manner and determine expression profiles and or 
abundance differences for target nucleic acid thus subsampled. 

In general, the method involves providing a high density- generic difference 
screening array where the probe oiigonucieotides comprise a constant region and a variable 
5 region as described above. In this instance, however, the last few bases of the constant 
region (anchor sequence) are selected to complement the 5' end of the restriction 
recognition site {see, e.g., Figs. 14a and 14b) and the complementarj' anchor sequence is 
shortened by the apprpriate number of bases. The variable region can be randomly 
selected, haphazardly selected, composition biased as described above. However, in a 

10 preferred embodiment, the variable region include all possible nucleic acids of a particular 
length (e.g., all possible 3 mers, all possible 4 mers ... all possible 12 mers), more 
preferably all possible 8 mers. 

The sample nucleic acids are prepared by fragmentation using a restriction 
enzyme. Preferred restriction enzymes leaving only 0, 1, or 2 bases at the 5' end provide a 

15 greater specificity of ligation {ie,. Sad leaves just a 5* C and Hsp92 II leaves no 

recognition site bases at the 5' end). However, restriction enzymes leaving more bases at 
the 5' end can be used. Several restriction enzymes can be used simultaneously if they all 
leave the same recognition base at the 5' end. For instance, Aat II, Sad, Sphl, Hhal 
Bspl286I, Apal, Kpn I, Ban II, all leave just a C at the 5' end making these compatible 

20 enzymes. Restriction enzymes and their characteristic recognition/cleavage sites are well 
known to those of skill in the art {see, e.g., CloneTech catalogue, Clonetech Laboratories 
Inc. Palo Alto, Ca). 

The digested target is then hybridized and ligated to the high density array, 
preferably in the presence of a complement to the constant region, using standard 

25 conditions (e.g., 30 °C, o/n, 800 U T4 ligase, T4 ligase buffer). The hybridization in effect 
sorts (locates and/or localizes) the sample nucleic acids the position of the sample nuclei 
acids being deleirmoed by the scqucricc of the ba:;e^ ud;;i**;e::t to the l estiict-ori site at the 5' 
end. The uybridization data can be used directly in an evprec^ton mnnitonno method as 
described above, or the same procedi'T-irt can be performed on two or more sample nucleic 

30 acids for generic diffe^'^n'*^ crrppmnp 
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In a preferred embodiment, one of mo formats are used. In Format I, the 
ligated fragment {e.g, the sample nucleic acid and, optionally, the complement to the 
constant region) is locked into place in the high density an^ay by its attachment {e.g., by 
cross-liiiking) to the complement {e.g,. by the use of a psoralen). Hie complemcntar\' 

5 strand to the fiagment can be denatured and washed ofTof Lhe array with a dilute base {e.g., 
1 N NaOK). 1 hesc cross-linked fragments can then be used as probes in a second round of 
hybridization to one or more nucleic acid samples. Differential nucleic acid abundances 
(e.g., differential gene expression) can then be monitored by comparing the hybriidzation 
pattern between different nucleic acids hybridized simultaneously or sequentially to the 

10 same array or separate arrays. 

In a second format (format II), particularly where the sample nucleic acid is 
a deoxynucleic acid sample, the DNA is restriction digested as described above, and then 
directly hybridized/ligated to the generic difference array. Sites where intensity differences 
occur indicate a difference in nucleic acid abundance. The differentially abundant (e.g., 

15 differentially expressed) nucleic acid can be cloned by designing primers specific to that 

nucleic acid based upon the sequence information derived from the location of the probe in 
the zrzzy r>r/^ ^^f^ seoTience of the recognition site. For an 8 mer (variable region) and a 6 
base restrictino enzyme, this gives a 14 mer primer sequence. For short genomes, a 14 mer 
primer may be used to isolate the clone. Longer genomes become more tractable as the 

20 length of the primary probes (variable region) increases beyond 8 mers. 

The restriction enzyme digested sample nucleic acid is preferably labeled 
and ligated to the high density array in fingerprinting method and in format II (see 
discussion above and Fig. 14d). In the case of formal I assays the ligated target sequence is 
preferably not labeled and instead, serves as a hybridization probe in a second round of 

25 hybridization of labeled sample nucleic acids to the high density array. 

To insuie uim sites which have not haen cleaved bv the given restriction 
en*!- ^-t r.vzit i'j ili^ "•■vh dcnc:t\' aray, ?*»cHlliie nhsophatasc can be used lo iie-^i tnc 
sample nucleic acids before restriction enzyme digestion. 

3y lii) Ana!yis of dlffrrpntial display fragmeitLi u generic difference arrpy 
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The principle behind differential display is to general a set of randomly 
primed amplification {e.g., PGR) fragments from a first strand cDNA population 
transcribed from RNA using anchor primers of the form: 

(T),VA, (T),VG, (T),VC\ and (T),VT 
5 in which V is A, G, or C, and n ranges from about 6 to about 30, preferably from about 8 to 
about 20 and more preferably about 10 to about 1 6 with r-14 being most preferred. 
Depending on what random primer and anchoring primer and anchoring proimer is chosen, 
different sets of cDNA transcripts are represented in a particular nucleic acid fragment set. 
These amplification fragments are analyzed by sorting the fragments on a generic screening 
10 oligonculeotide array where they hybridize based on the sequence at the 5' end of the 
fragement. 

The method is illustrated in Figures 16a through 16e. First strand cDNA is 
synthesized by reverse transcriptio of poly(A) mRNA using an anchored poly(t) primer 
according to standard methods (Fig. 16a). The first strand DNA acts as a template for 

15 amplification (e g, via PGR) using upstream primers comprising an engineered restriction 
site and one or more degenerate bases (N=A,G,G,T) at the 3' end. Randomly primed PGR 
is then performed using the upstream primers the anchor primers and a random primer 
(e.g., anchor primers (T),4VA, (T)]4VG, (T)^4^C, (T),4VT and random primer eg., Sacl 
site: 5*-GATGAGGTGNN). The resuhing amplification fragments are then digested with a 

20 restriction endonuclease corresponding to the engineered restriction sites. The resulting 
sample nucleic acids are then hybridized to a generic difference screening array as 
described above. 

The method is preferably performed to two or more nucleic acid samples 
thereby allowing use of the generic difference screening methods of this invention. In one 
25 embodiment, the probe oligonucleotides comprise a constant region complementary to the 
remaining restriction site on the sample nucleic acids if present. The remaining analysis 
proceeds as described above. 

The method allows aiiaiysis of several thou3£r»d or even mere "bands" 
(nucleic aeids) simultanwusly. fjrthcrmcre, sequence information is also provided on the 
30 Jificicuuaily abundant nucleic :ic:d. For s.*i2mr;!e v/here the c*'^^^'^"^ y^Mh ^ar* t 

providing a 9 base tail (CATQAGCTC) the arr^^y can comnnse probe oligonucieotides 
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haveing a complementary 9 base constant region and variable regions comprising all 
possible 9 mers. This provides 17 nucletides of sequence information for each 

hybridization (9 mer constant + 8 mer variable). 

5 iv) Use of ligation to extract additional sequence information from 

restriction selected nucleic acid hybridizations. 

Ligation reactions can also be used in combination with restriction digests 
to subsample the sample nucleic acids at approximately uniform intervals and 
simultaneously provide additional sequence information using a ligation reaction. In this 

10 embodiment, a high density array is provided in which the probe oligonucleotides comprise 
a nucleic acid sequence complementary' to the sense or antisense strand of a restriction site 
{see, e.g., Fig. 14). The sample nucleic acids are digested randomly with a DNAse or 
specifically with a restriction endonuclease (e.g., Sau3A). The digested oligoncleotides 
are then hybridized to the high density array. Only those nucleic acids having termini 

15 complementary to the constant regions will bind to the probe oligonucleotides. Thus, the 
restriction fragments will be preferentially selected. 

The array is also hybridized with a pool of ligatable oligonucleotides 
comprising all possible oligonucleotides of a particular length (e.g.. a 6 mer) in the 
presence of a ligase thereby ligating the complementary ligatable oligonucleotides to the 

20 terminus of the probe oligonucleotide. This produces probe oligonucleotides increased in 
length by the length of the ligatable oligonucleotide and complementary to nucleic acids 
known to be present in the nucleic acid sample. 

The DNA is then stripped off of the array and the elongated probes are used 
to perform generic difference screening of the nucleic acid samples as described above. 

25 When probes corresponding to nucleic acid differentially expressed in the various samples 
ate iutiitificd, the knov.^. p^ohe sequence can be used to identify the nucleic acids that are 

In one embodiment, this is accompiisheu by producing 4 pnmer 
^i;.^^noJ^ntides comonsmg the constant region plus the knov.^n variabk region and an 

1 nuclentide (A. U. C, or T) on ouc tnd. 11.= gcr.cniic clone !5 th.n digested with 
a second restriction enzyme and ligated to an adapter sequence Usmg the 4 primer 
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olgionucleotides and the adapter sequence as primers the genomic sequence of interest can 
be amplified (e.g., using PGR) from the genomic clones. The PGR amplfiied sequence can 
then be used to probe {e.g.. in a Southern blot) the cDNA librar\* to obtain the whole cDNA 
of interest. 

5 For example, in one embodiment, a 10 mer high denit>' array is designed so 

that it comprises all possible combination of 10 mer oligonucleotides {Le., 4'°=] 048576 
nucleic acids) and, at the beginning of each oligonucleotide, a constant sequence (e.g, 
3'-TAGT-5'), the first 4 bases of which are complementary to the recognition sequence of a 
restriction enzyme (e.g., Sau 3A plus one base T). 

10 Complete digestion of a large genomic clone or a simplified cDNA library 

{e.g., a cDNA library that only includes parts of the 5' end or 3' end of whole mRNA) with, 
for example, a 4 cutter enzyme (illustrated herein by Sau 3 A) generates DNA fi-agments 
with a 5' overhang sequence (for Sau 3A, the overhange is GATG). The recognition site 
exists at approximately every 500 bp. 

15 When the DNA fragments are hybridized with the 1 0 mer chip in the 

presence of all possible combinations of a ligatable oligonucleotide of a particular length 
(e.g., a 6 mer) and a T4 DNA ligase ,the ligatable oligonucleotide is ligated onto the probe 
oligonucletide. 

The DNA is then stripped off the the chip and generic difference screening 
20 is performed as described above. This permits identification of probe olgioonculeotides 
that hyridize to nucleic acids that are present at different levels in the tested samples. 

Based on the 14 bp sequence in this example (5 mer constant region bases 
plus 10 mers) fi-om the probes of interest in the array, four 1 6 base primers are produced by 
adding one base (A, G, C, or T) at the end. Using these primers and adaptor sequences as 
25 primers, the genomic sequence of interest can be amplified. The amplified sequence can 
then be used to probe a cDNA library to obtain the whole cDNA of interest as described 



IX. Signal Evaiuation. 

A) Signa! Evaluation far cxprcs^zcr. :T:cx:tarir:s. 
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One of skill in the art will appreciate that methods for evaluating the 
hybridization results vary with the nature of the specific probe nucleic acids used as well as 
the controls provided. In the simplest embodiment, simple quantification of the 
fluorescence intensity for each probe is determined. This is accomplished simply by 
5 measuring probe signal strength at each location (representing a different probe) on the 
high density array (e.g . where the label is a fluorescent label detection of the amount of 
fluorescence (intensity) produced by a fixed excitation illumination at each location on the 
array). Comparison of the absolute intensities of an array hybridized to nucleic acids from 
a *'test" sample with intensities produced by a "control" sample provides a measure of the 
10 relative abundance of the nucleic acids that hybridize to each of the probes. 

One of skill in the art, however, will appreciate that hybridization signals 
will vary in strength with efficiency of hybridization, the amount of label on the sample 
nucleic acid and the amount of the particular nucleic acid in the sample. Typically nucleic 
acids present at very low levels (e.g., < IpM) will show a very weak signal. At some low 
15 level of concenu-ation, the signal becomes virtually indistinguishable from background. In 
evaluating the hybridization data, a threshold intensity value may be selected below which 
a sienal is not counted as being essentially indistinguishable from backgroxmd. 

Where it is desirable to detect nucleic acids expressed at lower levels, a 
lower threshold is chosen. Conversely, where only high expression levels are to be 
20 evaluated a higher threshold level is selected. In a preferred embodiment, a suitable 
threshold is about 10% above that of the average background signal. 

In addition, the provision of appropriate controls permits a more detailed 
analysis that controls for variations in hybridization conditions, cell health, non-specific 
binding and the like. Thus, for example, in a preferred embodiment, the hybridization 
25 array is provided with normalization controls as described above in Section IV(A)(2). 

Tiiese iioriualiznticn controls a^-f probes complementary to control sequences added in a 
kiiovv ii coriccntraticn to the s^rnnlt^ V^ntiere the overall hybridization condiuoiis f^w, uoor, 
the normalization controls will show a smaller signal reiiecung reduced Irybndizat;cn. 
Cor-ve'-sf 1y where hvbridization condiuons are guuu, the nonriaiizaticn controls 
30 provide a higher signal retleciinR the impro ved u^uiidizaticn. Ncrmchzction of the S!£^^^ 
derived from other probes in the aiTa> io the normalization controls thus provides a control 
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for variations in array synthesis or in hybridization conditions. Typically, normalization is 
accomplished by dividing the measured signal from the other probes in the array by the 
average signal produced by the normalization controls. Nonmalization may also include 
correction for variations due to sample preparatior, and amplification. Such nGrmalization 
may be accomplished by dividing the measured signal by the average signal from the 
sample preparation/amplfication control probes (e g,, the BioB probes). The resulting 
values may be multiplied by a constant value to scale the results. 



in the case of generic difference screening arrays, pairs of related oligonucleotie probes 
differing in one or more preselected nucleotides. In preferred expression monitoring 
arrays, there is a mismatch control having a central mismatch for every probe (except the 
normalization controls) in the array. It is expected that after washing in stringent 
conditions, where a perfect match would be expected to hybridize to the probe, but not to 
the mismatch, the signal from the mismatch controls should primarily reflect non-specific 
binding or the presence in the sample of a nucleic acid that hybridizes with the mismatch. 
In expression monitoring analyses, where both the probe in question and its corresponding 
mismatch control both show high signals, or the mismatch shows a higher signal than its 
corresponding test probe, the signal from those probes is preferably ignored. The 
difference in hybridization signal intensity between the target specific probe and its 
corresponding mismatch control is a measure of the discrimination of the target-specific 
probe. Thus, in a preferred embodiment, the signal of the mismatch probe is subtracted 
from the signal from its corresponding test probe to provide a measure of the signal due to 
specific binding of the test probe. Similar, as discussed below, in generic difference 
screening, the difference between probe pairs is calculated. 

The concentration of a particular sequence can then be determined by 
measuring the signal intensity of each of the probes that bind specifically to that nucleic 
acid and r>0!H.»^'*>^*og to the rionriaiizaticn coniiulb. u'l-ere the signal frorri the probc" 
greater man the mismatch, the mismatch is subtracted wt>ere the mismatch intensity is 
equal to or greater than its correspondino ^^^st probe, the signal is ignored. The expression 
Icvc! cf a partic!il5r gene car^ t^?" scored by the number of positive signals (either 
absolute or above a threshold value), the imensiiy of Uie positive signals (cither absolute or 



As indicated above, the high density array can include mismatch controls or, 
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above a selected threshold value), or a combination of both metrics (e.g., a weighted 
average). 

It is a surjmsing discovery of this invention, that normalization controls are 
often unnecessary' for useful quantification of a hybridization signal Thus, where optimal 
5 probes have been identified in the tvvo step selection process as described above, in Section 
IV(B)(ii)(a), the average hybridization signal produced by the selected optimal probes 
provides a good quantified measure of the concentration of hybridized nucleic acid. 

B) Signal evaluation for generic difference screening. 
10 Signal evaluation for generic difference screening is performed in 

essentially the same manner as expression monitoring described above. However, data is 
evaluated on a probe-by-probe basis rather than a gene by gene basis. 

In a preferred embodiment, for each probe oligonucleotide the signal 
intensity difference between the members of each probe pair (K) is calculated as: 

15 

Xijki"Xijjt2 

V :^ KyKri/^iiTMion intensitv of the probe, i indicates which sample (in this case 
sample 1 or 2), and j indicates which replicate for each sample (in the case of Example 7 
where there were two replicates for each nucleic acid sample, j is 1 or 2), K is the probe 

20 pair ID number (in the case of Example 7, 1. . . 34,320), and 1 indicates one member of the 
probe pair, while 2 indicates the other member of the probe pair. 

The differences between the signal intensity difference for each probe pair 
between the replicates for each sample is then calculated. Thus, for example, the 
differences between replicate 1 and 2 of sample 1 (e.g, a normal the normal cell line) and 

25 between replicate 1 and replicate 2 of sample 2 {e.g., athe tumor cell line) for each probe is 
calculated as 

for k-1 to the total number of probes. 

Tne icpi;».atws car. be ncnr.^Ii^cd to e?-*^ other as: 
30 (i*ii:.i-Xi,voi / (S,,^, S,2^2) ^"^-P*" ^ ^'^ 

(X2ii,i-X2ifc2) - (X22»ci-X22,c2) for Sample 2 
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for all probe paii^ {i.e., after normalization, the average ratio should approximate 1). 

Finally, the the differences between sample 1 and 2 averaged over the two 
replicates is calculated. This value is calculated as 

( CX2x.i+X,,^,) /2) - i CX,,^,^X,2^,) /2) 
after normalization between the two samples based on the average ratio of 

[ (X2m+ ^22^) /2}/l CX,,,,+X,2^,) /2] . 
This data is plotted as a fimction of probe number (ID) and probes having differentially 
hybridized nucleic acids are readily discemable {see, e.g., Fig. 16c). 

However, the data may also be filtered to reduce background signal. In this 
instance, after normalization between replicates (see above), the ratio is calculated as 
follows: If the absolute value of (X,ikrX,ifc2y(Xi2krXi2k2) ^ 1, then the 
ratio-(X,„,,-Xi„^)/(X,2k,-Xi2k2) else the ratio= (Xi2u'^i2uy(Xu^r^uk2) (the inverse). 

The ratio of replicate 1 and 2 of sample 2 for the difference of each 
oligonucleotide pair, is calculated in the same way, but based on the absolute value of 

(X2iitl"-X2ijt2) / (X22Jcl"X22k2) 
(X22kl"X22Jc2) / (^21kl"X2ik2) • 

Pirtany ac aK/^ve, t^s tat'o of Sample 1 ar»d s?Tr»nle ?. aver^jjed over two renlic^ste^ for the 
difference of each oligonucleotide pains calculated as in Fig. 1 7a, but based on the 
absolute value of 

[ (X2i,i+X„„2) /2] / [ (Xii,,+Xi2„) /2] and 

I(Xiikl+Xl2,c2)/2]/[(X2i^l+X 

22)c2 

)/2] 

after normalization as described above. 

The oligonucleotide pairs that show the greatest differential hybridization 
between the two samples can be identified by sorting the observed hybridization ratio and 
difference values. The oligonucleotides that show the largest change (increase or decrease) 

c;?rt hs reiidilv ^ecf^ fxoxti tlie ratio Dlot (s6e. Fis. 1 7c). 



X. Ideniiflcation of Gene Wkase Expression Is Altered. 

A*; indicated above, the nucleic acid sequences of the probe 
oligonucleotides comprising the high uensiiy arrays ate kjiuwn. Tlie 6ecjueiicei> uf Uic 
probes showing the largest hybridization differences (and families of such differences) can 
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be used to identify the differentially expressed genes in the compared samples by any of a 
number of means. 

Thus, for example, sequences of the differentially hybridizing probes may 
be used to search a nucleic acid database (e.g., by a BLAST, or related search of the 
5 fragments against all known sequences). Alternatively, some sequence reconstruction 
using the families of probes that change by similar amounts can also be done. The 
database search for known genes that include sequences complementary (or nearly 
complementary ) to the probes that change the most is not difficult and because it is 
generally easier than sequence reconstruction is the preferred method for identifying the 
10 differentially expressed sequences. 

In another embodiment, the differential hybridization pattern indicates that 
there are significant differences in the overall expression profile(s) between the tested 
samples, and identifies probes that are specific for the differences. These probes can be 
used as specific affinity reagents to extract from the samples the parts that differ. This can 
15 be accomplished in several ways: 

In one approach, the material hybridized to the probes that show the greatest 
differences between samples can be micro-extracted from the high density array. For 
example, the hybridized nucleic acids can be removed using small capillaries. 
Alternatively probes that are anchored to the chip with a photolabile linker can be released 
20 by selective irradiation at the desired parts of the high-density array. 

In another approach, because the sequence of all the probes on the high- 
density array is known, and the probes that hybridize differentially have been identified, 
the latter can be used as affinity reagents to extract the nucleic acids that differentially 
hybridize in the test samples. Once the differentially hybridizing probes are identified in 
25 the array^ the probe (or probes) can be synthesized on beads (or other solid support) and 

hybridized to the samples (not necessarily fragmented for this step -full length clones may 
be desirable). The mat^rml that is extracted can be cloned and'or sequenced, accordii^ij h- 
standard methods knovsm to tnose of skill in the art, to obtain tlie dt:>iicd iiiforiuation about 
the dift'erentially expressed species {e,g, clones can be screened with labeled 

30 oliponucleotides to oetermme oueb wmi appiupnatv i».aw. v^, — 

sequenced). 
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In still another approach, the sequence of the hybridized probes of interest 
can be used to generate amplification primers (e.g., reverse transcription and/or PCR 
primers). The differentially expressed sequence can Lhen be amplified and used as a probe 
to probe a genomic or cDNA Iibrar>' using sequence sprecific primers determined from the 
5 array in combination with specific sequences added during a reverse transcriptase cDNA 
step as described above (e.g., primerbased on poly A or added 3* sequence). Examples of 
appropriate cloning and sequencing techniques, and instructions sufficient to direct persons 
of skill through many cloning exercises are found in Berger and Kimmel, Guide to 
Molecular Cloning Techniques, Methods in Enzymology volumt 152 Academic Press, Inc., 

10 San Diego, CA; Sambrook et al (1989) Molecular Cloning - A Laboratory Manual (2nd 
ed.) Vol. 1-3; and Current Protocols in Molecular Biology^ F.M. Ausubel et al., eds., 
Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John 
Wiley & Sons, Inc., (1994 Supplement) (Ausubel). Product information from 
manufacturers of biological reagents and experimental equipment also provide information 

15 useful in knoun biological methods. Such manufacturers include the SIGMA chemical 
company (Saint Louis, MO), R&D systems (Minneapolis, MN), Pharmacia LKB 
Bictcchnclcgy (P!SC^t?Av?.y, NJ), CLONTFCH I ahoratories. Inc. fPalo Alto, CA), Chem 
Genes Corp., Aldrich Chemical Company (Milwaukee, Wl), Glen Research, Inc., GIBCO 
BRL Life Technologies, Inc. (Gaithersberg, MD), Fluka Chemica-Biochemika Analytika 

20 (Fluka Chemie AG, Buchs, Switzerland), Invitrogen, San Diego, CA, and Applied 

Biosystems (Foster City, CA), as well as many other commercial sources known to one of 
skill. 

In short, using the above-described method, differentially expressed genes 
can be identified without prior assumptions about which genes to monitor and without 

25 prior knowledge of sequence. Once identified (and sequenced if not a previously 

sequenced gene), the new sequences can be iricluded in a high ueiisity array designed to 
dett^i aiii q\i3^ify ^p?:cif;c renes in the ^ame -^^-dy descHbtd h\ con5r;di"g applications 
No. 08/529,1 15 filed on September 15, 1995 and PCT/US96/14839. Thus, the two 
appiuacliwS arc coiriplcmcntarj' in thzt one C2n u^ed WpHly cpprrh for evnression 

30 differences ofpeiiiap^ lUikriCVvTi genes,, while the ether is used to more spec!f»c?Jly 
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monitor those genes that have been chosen as important or those genes that have been 
previously at least partially sequenced. 

XI. Kits for Expression Monitoring and Generic Difference Screening. 

5 In another embodiment, this invention provides kus for expression 

monitoring and/or generic difference screening. The kits include, but are not limited to a a 
container or containers containing one or more high density oligonucleotide arrays of this 
invention. Preferred kits for generic difference screening include at least two high density 
arrays. The kits can also include a label or labels for labeling one or more nucleic acid 

10 samples. In addition, the kits can include one or more ligatable oligonucleotides. In 
certain embodiments, the kit contains pools of different ligatable oligonucleotides, 
preferably pools of every possible oligonucleotide of a particular length (e.g.. all possible 6 
mers) or sets of specific ligatable oligonucleotides. One of skill in the art will appreciate 
that the kits may include any other of the various blocking reagents, labels, devices {e.g. 

15 trays, microscope filters, syringes, etc.) buffers, and the like useful for performing the 
hybridizations and ligation reactions described herein. In addition, the kits may include 
software provided on a storage medium (e.g., optical or magnetic aisk) for the se'iectiuii «f 
probes and/or the analysis of hybridization data as described herein. In addition, the kits 
may contain instructional materials teaching the use of the kit in the various methods of 

20 this invention (e.g., in practice of various expression monitoring methods or generic 
difference screening methods described herein). 

AT/. Computer-Implemented Expression Monitoring. 

The methods of monitoring gene expression of this invention may be 

25 ptiformcd utilizing a coTPputer. The computer typically runs a software program that 

inrfuiioi ating thc inveniion for diiaiy^ing riyt;ndizai;on i:ileiiS";".":CS 
measured from a substrate or chip and thus, roonitu.iiig the expression of one or ^ore 
apnps or .screening tor differences in uucJeic acid abundances. A!thoi.£'n the following v,-ill 
describe snprific embodiments of uic u.vcr.ticn, the invention nnt Umiteri tn any one 

30 embodiment so the following is for purpcses cf illustraticn sud not limitation. 
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Fig. 6 illustrates an example of a computer system used to execute the 
software of an embodiment of the present invention. As shown, shows a computer system 
100 includes a monitor 102, screen 104, cabinet 106, keyboard 108, and mouse 1 iO. 
Mouse 110 may have one or more buttons such as mouse buttons 1 12. Cabinet 106 houses 
5 a CD-ROM drive 1 14, a system memor>' and a hard drive (both shown in Fig. 7) which 
may be utilized to store and retrieve software programs incorporating computer code thai 
implements the invention, data for use with the invention, and the like. Although a CD- 
ROM 1 16 is shov/n as an exemplary computer readable storage medium, other computer 
readable storage media including floppy disks, tape, flash memory, system memory, and 

10 hard drives may be utilized. Cabinet 106 also houses familiar computer components (not 
shown) such as a central processor, system memory, hard disk, and the like. 

Fig. 7 shows a system block diagram of computer system 1 00 used to 
execute the software of an embodiment of the present invention. As in Fig. 6, computer 
system 100 includes monitor 102 and keyboard 108. Computer system 100 further 

15 includes subsystems such as a central processor 120, system memory 122, 1/0 controller 
124, display adapter 126, removable disk 128 {e.g., CD-ROM drive), fixed disk 130 (e.g., 
h^rd drive), network interface 132, and speaker 134. Other computer systems suitable for 
use with the present invention may include additional or fewer subsystems. For example, 
another computer system could include more than one processor 120 (i.e., a multi- 

20 processor system) or a cache memory. 

Arrows such as 136 represent the system bus architecture of computer 
system 100. However, these arrows are illustrative of any interconnection scheme serving 
to link the subsystems. For example, a local bus could be utilized to connect the central 
processor to the system memory and display adapter. Computer system 100 shown in Fig. 

25 7 is but an example of a computer system suitable for use with the present invention. 
Other configurdLioiia of Suosystcms suitable for use with the present inven^^op wiil he 
rr,n.diiv ann^reni one of CfTdizzr^- f;k;!l ;n the art. 

Fig. 8 shows a flowchart of a process of monitoring the expression of a 
gene. The prcce?*^ r Amperes hybridization intensities of pairs ol pertect match and 

30 mismatch probes that are preferably rnv?^lently ^itt^iched to me surtace of a substrate or 
chip. Most preferably, the nucleic acid probes have a density greater than about 60 
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different nucleic acid probes per 1 cm' of the substrate. Although the flowcharts show a 
sequence of steps for clarity, this is not an indication that the steps must be performed in 
this specific order. One of ordinary skill in the art would readily recognize that many of 
the steps may be reordered, combined, and deleted without departing from the invention, 

5 Initially, nucleic acid probes are selected tliat are complementar}' to tlie 

target sequence (or gene). These probes are the perfect match probes. Another set of 
probes is specified that are intended to be not perfectly complementary to the target 
sequence. These probes are the mismatch probes and each mismatch probe includes at 
least one nucleotide mismatch from a perfect match probe. Accordingly, a mismatch probe 

10 and the perfect match probe from which it was derived make up a pair of probes. As 

mentioned earlier, the nucleotide mismatch is preferably near the center of the mismatch 
probe. 

The probe lengths of the perfect match probes are typically chosen to 
exhibit high hybridization affinity with the target sequence. For example, the nucleic acid 
15 probes may be all 20-mers. However, probes of varying lengths may also be synthesized 
on the substrate for any number of reasons including resolving ambiguities. 

i he target scquci.ce ;5 ty^zc]\y fes^erted labeled and exposed to a 
substrate including the nucleic acid probes as described earlier. The hybridization 
intensities of the nucleic acid probes is then measured and input into a computer system. 
20 The computer system may be the same system that directs the substrate hybridization or it 
may be a different system altogether. Of course, any computer system for use with the 
invention should have available other details of the experiment including possibly the gene 
name, gene sequence, probe sequences, probe locations on the substrate, and the like. 

Referring to Fig. 8, after hybridization, the computer system receives input 
25 of hybridization intensities of the multiple pairs of perfect match and mismatch probes at 
tl^'ti S-'^Z '^^r hvbrldizatio^ intensities indicate hybridization afaniiy between the nucleic 
2C!d probes and the taigct nucieic acid (^vh:ch co~^^df^ to a ge--). Each includes a 
perfect match probe Lhat is perfectly complementary to a portion of the target nucleic acid 
and a mismatch probe thai differs from utt p^.f.^ 7r.:.xzh ?r-be by r'* >f-t ^n. nucleotide. 
3Q step 204, the computer system compares the hybridization intensi^^p^ nt 

the perfect match and mismatch probes of each pair. If the gene is expressed, the 
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hybridization intensity (or affinity) of a perfect match probe of a pair should be 
recognizably higher than the corresponding mismatch probe. Generally, if the 
hybridizations intensities of a pair of probes are substantially the same, it may indicate the 
gene is not expressed. However, the determination is not based on a single pair of probes, 
5 the deiermination of whether a gem is expressed is based on an analysis of many pairs of 
probes. An exemplary process of comparing the hybridization intensities of the pairs of 
probes will be described in more detail in reference to Fig. 9. 

After the system compares the hybridization intensity of the perfect match 
and mismatch probes, the system indicates expression of the gene at step 206. As an 

10 example, the system may indicate to a user that the gene is either present (expressed), 
marginal or absent (unexpressed). 

Fig. 9 shows a flowchart of a process of determining if a gene is expressed 
utilizing a decision matrix. At step 252, the computer system receives raw scan data of N 
pairs of perfect match and mismatch probes. In a preferred embodiment, the hybridization 

15 intensities are photon counts from a fluorescein labeled target that has hybridized to the 

probes on the substrate. For simplicity, the hybridization intensity of a perfect match probe 
v/ill be designed ard the hyhridiTation inlensitv of a mismatch probe will be designed 

Hybridization intensities for a pair of probes is retrieved at step 254, The 
20 background signal intensity is subtracted from each of the hybridization intensities of the 
pair at step 256. Background subtraction may also be performed on all the raw scan data at 
the same time. 

At step 258, the hybridization intensities of the pair of probes are compared 
to a difference threshold (D) and a ratio threshold (R). It is determined if the difference 
25 between the hybridization intensities of the pair (Ip„ - 1„„) is greater than or equal to the 

rii ftierence threshold AND tlie quoticm of the hybridization intensities of the pair (Ip^ / !,^.,J 
gr^^tcr t^"^ ecsuai io iii* i^t'O •'■•€^"0*M The dittcrence thresholds are t^'pica*'y "s^r 
defined values that have been determined to produce accurate expression monitoring of a 
gene cr gcnc3. In one embo^'"^^^* difference threshold is 20 and the ratio threshold is 
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If I - 1 >= D and I_ / >= R, the value KPOS is incremented at step 

^pm mm ****** '■mm ' ^ 

260. In general, NPOS is a value that indicates the number of pairs of probes which have 
hybridi^ticn intensities indicating that the gene is !ikcl> expressed. NPOS is utilized in a 
determination of the expression of thi; gene. 
5 At step 262, it is determined if - Ip„ >= D and I^„. / R. If this 

expression is true, the value NNEG is incremented at step 264. In general, NNEG is a 
value that indicates the number of pairs of probes which have hybridization intensities 
indicating that the gene is likely not expressed. NNEG, like NPOS, is utilized in a 
determination of the expression of the gene. 
jQ For each pair that exhibits hybridization intensities either indicating the 

gene is expressed or not expressed, a log ratio value (LR) and intensity difference value 
(IDIF) are calculated at step 266. LR is calculated by the log of the quotient of the 
hybridization intensities of the pair (I^^ / U). The IDIF is calculated by the difference 
between the hybridization intensities of the pair (Ip„ - I„J. If there is a next pair of 
1 5 hybridization intensities at step 268, they are retrieved at step 254. 

At step 272, a decision matrix is utilized to indicate if the gene is expressed. 
The decision matnx utilizes me value. N, NPCS, NNEG, ^"A LR (multiple LRsl The 
following four assignments are performed: 
PI = NPOS /NNEG 
20 P2=NPOS/N 

P3 = (1 0 * SUM(LR)) / (NPOS + NNEG) 
These P values are then utilized to determine if the gene is expressed. 

For purposes of illustration, the P values are broken down into ranges. If PI 
is greater than or equal to 2.1, then A is true. IfPl is less than 2.1 and greater than or 
25 equal to 1.8, then B is true. Otherwise, C is true. Thus, PI is broken down into three 
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X = (P2 >- 0.35) 

Y = (0.35 > P2 >= 0.20) 

Z-(P2<0.20) 

5 Q-(P3>= 1.5) 

R-(L5>P3 >= l.I) 
S = (P3< 1.1) 

Once the P values are broken down into ranges according to the above boolean values, the 

gene expression is determined. 
10 The gene expression is indicated as present (expressed), marginal or absent 

(not expressed). The gene is indicated as expressed if the following expression is true: A 

and (X or Y) and (Q or R). In other words, the gene is indicated as expressed if P 1 >= 2. 1 , 

P2 >= 0.20 and P3 >= 1 . 1 . Additionally, the gene is indicated as expressed if the following 

expression is true: B and X and Q. 
15 With the forgoing explanation, the following is a summary of the gene 

expression indications: 

Pre^ert A and fX or Y) and (Q or R) 

B and X and I 



20 Marginal A and X and S 

B and X and R 
B and Y and (Q or R) 



Absent All others cases (e.g., any C combination) 

25 In the output to the user, present may be indicated as "P," marginal as "M" and absent as 
^A" at step 274. 

Oncc! rJi the pAr^ ef priibtfi li^ve l^c- r-^occ^^r^ or-d thz cvprcsr^icn ci the 
gene indicated, an average often times the LRs is computed at step 275. Additionally, an 
average of the IDIF valnef: for th? p'-obe^ inr^rpmpnted NPOS and NNEG is calculated, 
30 Taese valuta may be utilized for quantitative comparison? nf thk eynrnmpnts with other 
experiments. 
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Quantitative measurements may be performed at step 276. For example, the 
current experiment may be compared to a previous experiment (e.g., utilizing values 
calculated at step 270). Additionally, the experiment may be compared to hybridization 
intensities of RKA (such as from bacteria) present in the biological sample in a known 

5 quantit}\ In this manner, one may verify the con ectness of the gene expression indication 
or call, modify threshold values, or perform any number of modifications of the preceding. 

For simplicity, Fig. 9 was described in reference to a single gene. However, 
the process may be utilized on multiple genes in a biological sample. Therefore, any 
discussion of the analysis of a single gene is not an indication that the process may not be 

10 extended to processing multiple genes. 

Figs. lOA and lOB show the flow of a process of determining the 
expression of a gene by comparing baseline scan data and experimental scan data. For 
example, the baseline scan data may be from a biological sample where it is known the 
gene is expressed. Thus, this scan data may be compared to a different biological sample 

15 to determine if the gene is expressed. Additionally, it may be determined how the 
expression of a gene or genes changes over time in a biological organism. 

At step 302, tne computer hy^.^ ..ccivcs r™ scr^- d?^-- o^N pairs of 
perfect match and mismatch probes from the baseline. The hybridization intensity of a 
perfect match probe from the baseline will be designed "Ip„" and the hybridization intensity 

20 of a mismatch probe from the baseline will be designed The background signal 

intensity is subtracted from each of the hybridization intensities of the pairs of baseline 

scan data at step 304. 

At step 306, the computer system receives raw scan data of N pairs of 
perfect match and mismatch probes from the experimental biological sample. The 
25 hybridization intensity of a perfect match probes firom the experiment will be designed 

::7 ^v,*. tw iv. intei'»s^^' ^ TTikmatch probe from the experiment will be 

dcsinned " l^e background Siguai miensitj' is subtraeleu liOvri e?cr. r.t tne 
hybridization intensities of the pairs of experimental scan data at step 308. 

The hybridization mtensities of an i and J pan luay be nG:^al;zcd at step 
30 3 1 0. For example, the hybridization intensities the i and J paii^ may be divided by the 
hybridization intensity of control probes as discussed above in Section 1V{A). 
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At step 3 12, the hybridization intensities of the I and J pair of probes are 
compared to a difference threshold (DDIF) and a ratio threshold (RDIF). It is determined 

if the difference between the hybridization intensities of the one pair - J^^) and the 
other pair (1^^^ - 1^^,,^) are greater than or equal to the difference threshold AND the quotient 
of the hybridization intensities of one pair (J^,,, - J, ™) and the other pair (1^^ « 1^^,^) are 
greater than or equal to the ratio threshold. The difference thresholds ai^ typically user 
defined values that have been determined to produce accurate expression monitoring of a 
gene or genes. 

If (Jp. - J.J - - In.J >= DDIF and (J,, - J^J / (1^, - >= RDIF, the 
value NINC is incremented at step 314. In general, NINC is a value that indicates the 
experimental pair of probes indicates that the gene expression is likely greater (or 
increased) than the baseline sample. NINC is utilized in a determination of w^hether the 
expression of the gene is greater (or increased), less (or decreased) or did not change in the 
experimental sample compared to the baseline sample. 

At step 3 16, it is determined if (J^, - J_) - (1^^ - U) >= DDIF and (J,, - 
Jmm) / (Ipm ^ Imm) ^" RDIF. If this cxprcssion is true, NDEC is incremented. In general, 
NDEC is a value that indicates the experimental pair of probes indicates that the gene 
expression is likely less (or decreased) than the baseline sample. NDEC is utilized in a 
determination of whether the expression of the gene is greater (or increased), less (or 
decreased) or did not change in the experimental sample compared to the baseline sample. 

For each of the pairs that exhibits hybridization intensities either indicating 
the gene is expressed more or less in the experimental sample, the values NPOS, NNEG 
and LR are calculated for each pair of probes. These values are calculated as discussed 
above in reference to Fig. 9. A suffix of either "B" or "E" has been added to each value in 
order to indicate if the value denotes the baseline sample or the experimental sample, 
icspcctivcly. If there are next pairs of hybridization !nten<;irie<; at step 'V9^ they are 
i^^iocessed in a similar manner as shov.'n. 

Referring now to Fig. lOB, m absolute decision computation is performed 
fnr HotH the baseline and experimental samples at step 324. The absolute decision 
computation is \n6'\c^txon of whether tne gene is expressed, marginal or absent in each 
of the baseline and experimental samples. Accordingly, in a piefcncd cmbodiiuem, this 
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Step entails performing steps 272 and 274 fronr. Fig. 9 for each of the samples. This bemg 
done, there is an indication of gene expression for each of the samples taken alone. 

At step 326, a decision matrix is utilized to determine the difference in gene 
expression between tiie two samples. Ttds decision matrix utilizes the vaiues, N, NPOSB. 
NPOSIi, NNEGB, NNFGE. NINC, NDEC, LRB, and LRE as they were calculated above. 
The decision matrix performs different calculations depending on whether NINC is greater 
than or equal to NDEC. The calculations are as follows. 

If NINC >= NDEC, the following four P values are detemiined: 



10 PI = NINC /NDEC 

P2 = NINC/N 

P3 = ((NPOSE - NPOSB) - (NNEGE - NNEGB)) / N 
P4- 10* SUM(LRE-LRB)/N 
These P values are then utilized to determine the difference in gene expression between the 
15 two samples. 

For purposes of illustration, the P values are broken down into ranges as 

was done previously. Tnus, aii of iiie ? val^^o brclicn dcr-- i-to ranges according to 
the following: 

A = (PI >= 2.7) 
20 B = (2.7>P1>=1.8) 

C = (P1 <1.8) 



X = (P2 >= 0.24) 
Y = (0.24>P2 >= 0.16) 
25 Z = (P2< 0.160) 



l^yf = ^P^ >= u.i /) 

N = (0.17>P3 >=0.!0) 
O = fP3<0.10) 



in 



Q = (P4>= 1.3) 
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R = (L3>P4 >= 0.9) 
S = (P4 < 0.9) 



q<7 



Once the P values are broken down into ranges according to the above boolean values, the 
5 difference in gene expression between the two samples is determined. 

in this case where NINC >^ NDEC, the gene expression change is indicated 
as increased, marginal increase or no change. The following is a summary of the gene 
expression indications: 

Increased A and (X or Y) and (Q or R) and (M or N or O) 

10 A and pC or Y) and (Q or R or S) and (M or N) 

B and (X or Y) and (Q or R) and (M or N) 
A and X and (Q or R or S) and (M or N or O) 

Marginal A or Y or S or O 

1 5 Increase B and (X or Y) and (Q or R) and O 

B and (X or Y) and S and (M or N) 
C and (X or Y) and (Q or R) and (M or N) 

No Change AH others cases {e.g., any Z combination) 

In the output to the user, increased may be indicated as "I," marginal increase as "MI" and 
no change as "NC." 

If NINC < NDEC, the following four P values are determined: 



20 



25 PI = NDEC /NINC ' 

P4 = 10 * SUMG^RE - LRB) / N 

30 Tliese P values arc then utilized to detennine tb^ difference m gene expression between the 
two samples. 
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The P values are broken dov^it into the same ranges as for the other case 
where NINC >= NDEC. Thus, P values in this case indicate the same ranges and will not 
be repeated for the siike of brevity However, the ranges generaiK indicate different 
changes in the gene expression between the two samples as shovMi below. 

In this case where NINC < NDEC, the gene expression change is indicated 
as decreased, marginal decrease or no change. The following is a summary of the gene 
expression indications: 

A and P( or Y) and (Q or R) and (M or N or O) 
A and (X or Y) and (Q or R or S) and (M or N) 
B and (X or Y) and (Q or R) and (M or N) 
A and X and (Q or R or S) and (M or N or O) 



Decreased 

10 



Marginal A or Y or S or O 

15 Decrease B and (X or Y) and (Q or R) and O 

B and (X or Y) and S and (M or N) 
C aiia (X or Y) and (Q c: R) {M or N) 



No Change All others cases (e.g. , any Z combination) 



20 



In the output to the user, decreased may be indicated as "D," marginal decrease as "MD" 

and no change as "NC." 

The above has shovm that the relative difference between the gene 
expression between a baseline sample and an experimental sample may be determined. An 
25 additional test may be performed that would change an I, MI, D, or MD {i.e., not NC) call 

in both ssTTiDles (e.s., trotrj step 324) and ilie 

fo!lG%vin£ sxpressiors a'e aii true: 

AveragedDIFB) >= 2UU 
30 AvemgeaDIFE) >" 200 

1 .4 >= Average(IDIFE) / Average(lDIFB) >= 0.7 
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Thus, when a gene is expressed in both samples, a call of increased or decreased (whether 
marginal or not) will be changed lo a no change call if the average intensity difference for 
each sample is relatively large or substantially the same for both samples. The IDIFB and 
IDIFE are calculated as the sum of all the IDIFs for each sample divided by N. 
5 At step 328, values for quantitative difference evaluation are calculated. An 

average of ((Jp^ - J^J - (Ip^ - l^J) for each of the pairs is caicuiated. Additionally, a 
quotient of the average of Jp^^, - J^^^ and the average of I^^ - 1^^^^ is calculated. These values 
may be utilized to compare the results with other experiments in step 330. 

10 EXAMPLES 

The following examples are offered to illustrate, but not to limit the present 

invention. 

Example 1 

15 First Generation Oligonucleotide Arrays Designed to Measure mRNA 
Levels for a Small Number of Murine Cytokines. 

A) Preparation of Labeled RNA. 

1) From Each of the Preselected Genes, 

Fourteen genes (IL-2, IL-3, II-4, IL-6, 11-10, IL-12p40, GM-CSF, IFN-y, 
20 TNF-a, CTLA8, 6-actin, GAPDH, IL-1 1 receptor, and Bio B) were each cloned into the p 
Bluescript II KS (+) phagemid (Stratagene, La Jolla, Califomia, USA). The orientation of 
the insert was such that T3 KN A polymerase gave sense transcripts and T7 polymerase 
gave antisense RNA. 

Labeled ribonucleotides in an in vitro transcription (IVT) reaction. Either 
25 biotin- or fluoresccin-labeled UTP and CTP (1 :3 labeled to unlabeled) plus unlabeled ATP 
and GTP were nn the rcaciicn v/ith 23G0 u:i;is ef T? RNA pr.lym-nise (Epicentre 
TechriOiogitS, Madison, Wisccncin, USA). In ^:^^o transcription was done wiiii cut 
templates m a manner like thm described by ivielton et al. Nucleic Acids Research, 12: 
7035 7056 (1984). A tj^pirpl vitro transcription reaction used 5 jig UNA template, a 
30 buffer such as that included in Ambion's Maxiscripi in viiro Transcription Kit (.Ambion 

Inc., Huston, Texas, USA) and GTP (3 mM), ATP (1.5 mM), and CTP and fluoresceinated 
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UTP (3 mM total, UTP: Fl-UTP 3:1) or UTP and fluoresceinated CTP (2 mM total, CTP: 
Fl-CTP, 3:1). Reactions done in the Ambion buffer had 20 mM DTT and RNase inhibitor. 
The reaction was run from 1 .5 to about 8 hours. 

Following the reaction, unincorporated nucleotide triphosphates were 

5 removed using a size-selective membrane (microcon-1 00) or Pharmacia microspin S-200 
column. The total molar concentration of RNA was based on a measurement of the 
absorbance at 260 nm. Following quantitation of RNA amounts, RNA was fragmented 
randomly to an average length of approximately 50 - 100 bases by heating at 9A^C in 40 
mM Tris-acetate pH 8.1, 100 mM potassium acetate, 30 mM magnesium acetate for 30 - 

10 40 minutes. Fragmentation reduces possible interference from RNA secondary structure, 
and minimizes the effects of multiple interactions with closely spaced probe molecules. 

2) From cDNA libraries. 

Labeled RNA was produced from one of two murine cell lines; TIO, a B 
15 cell plasmacytoma which was known not to express the genes (except IL-10, actin and 
GAPDH) used as target genes in this study, and 2D6, an IL-12 growth dependent T cell 
hne (m, suDtype) tliai is kiio^v. tc cx?r=ss ~os« of the genes used as target genes in this 
study. Thus, RNA derived from the TIO cell line provided a good total RNA baseline 
mixture suitable for spiking with known quantities of RNA from the particular target 
20 genes. In contrast, mRNA derived from the 2D6 cell line provided a good positive control 
providing typical endogenously transcribed amounts of the RNA from the target genes. 

i) The TIO murine B cell line. 
The Tl 0 cell line (B cells) was derived from the IL-6 dependent murine 
25 plasmacytoma line Tl 165 (Nordan et at. (1986) Science 233: 566-569) by selection in the 
yrssencc of II . 1 1 . To orcpare the directional cl JNA library, total celiuIar RNA was 

. . . -r.. « .1 nv .-. .' ♦;=' • s". "/iJv I'AV UNA. wo-j selected 

isolated trom i lu ecus uaing iCSviStuicv; ^le:- rt— • v: — — r— ^ ^'^ — 

using the PolyAtract kit (Promega, Madison, Wisconsin, USA). First and second strand 
cDNA was synthesized according .o Tool, e: , (1 9£4) A'-"-", ^40-347. except that 
30 5-methyldcoxyc>tidine 5'triphosnbaie (Phannacla LIOJ, Piccatavvay, New Jersey, I ISA) 
was substituted for DCTP in both reactions. 



wo 91 aim 



PCT/US97/01603 



To determine cDNA frequencies TIC libraries were plated, and DNA was 
transfered to nitrocellulose filters and probed with ■'^P-labeled P-actin, GAPDH and IL-10 

probes. Actin was represented at a frequency of 1 :3000, GAPDH at 1 :1000, and IL-10 at 
1 :35,000. Labeled sense and antisense TIO RNA samples were synthesized from Noil and 
5 Sfii cut CDNA libraries in in vitro transcription reactions as described above. 

ii) The 2D6 murine helper T cells line. 
The 2D6 cell line is a murine IL-12 dependent T cell line developed by 
Fujiwara et al Cells were cultured in RPMI 1640 medium with 10% heat inactivated fetal 
10 calf serum (JRH Biosciences), 0.05 mM P-mercaptoethanol and recombinant murine IL-12 
(100 units/mL, Genetics Institute, Cambridge, Massachusetts, USA). For cytokine 
induction, cells were preincubated overnight in IL-I2 free medium and then resuspended 
(10^ cells/ml). After incubation for 0, 2, 6 and 24 hours in media containing 5 nM calcium 
ionophore A23187 (Sigma Chemical Co., St. Louis Missouri, USA) and 100 nM 
15 4-phorbol-12-myristate 13-acetate (Sigma), cells were collected by centrifugation and 
washed once with phosphate buffered saline prior to isolation of RNA. 

Labeled 206 mRNA was produced by directional ly cloning the 2D6 cDNA 
with aZipLox, Notl-Sall arms available from GibcoBRL in a manner similar to TIO. The 
linearized pZll library was transcribed with T7 to generate sense RNA as described above. 

20 

Hi) RNA preparation. 
For material made directly from cellular RNA, cytoplasmic RNA was 
extracted from cells by the method of Favaloro et al, (1980) Metk Enzym,, 65: 718-749, 
and poly (A)* RNA was isolated with an oligo dT selection step (PolyAtract, Promega, ). 

25 RNA was amplified using a modification of the procedure described by Eberwine et al. 
(1992) Hroc. Natl Acad, ScL USA. S9: 3G]G'30J4 {sec also Van Gelder cl (1990) 
St:i^fsce 8 1 66*,-; 667). One utiLto^d^t'i r**~-y (A) > was converted into 
double-stranded cDNA using a cDNA synthesis kit (Life Technologies) with an oligo dT 
piiuiC incGrpcrating z T7 RNA po'y"^^^*^f nromoter site. After second strand synthesis, 

30 the reaction liiixrurc was extracted with phenol/chlorof-^^Ti and the double-stranded UNA 
isolated using a membrane filtration step (Mircocon-100, Amicon, inc. Beverly, 
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Massachusetts, USA). Labeled cRNA was made directly from the cDNA pool w-ith an IVT 
step as described above. The total molar concentration of labeled CRNA was determined 
from the absorbance at 260 and assuming an a%'eragc RNA size of 1000 ribonucleotides. 
RNA concentration was calculated using the conventional conversion that 1 OD is 
5 equivalent to 40 ^g of RNA, and thai 1 ^g of cellular mRNA consists of 3 pinoles of RNA 
molecules. 

Cellular mRNA was also labeled directly without any intermediate cDNA 
or RNA synthesis steps. Poly (A)^ RNA was fragmented as described above, and the 5' 
ends of the fragments were kinased and then incubated ovenight with a biotinylated 
10 oligoribonucleotide (5'-biotin-AAAAAA-3') in the presence of T4 RNA ligase (Epicentre 
Technologies). Alternatively, mRNA was labeled directly by UV-induced crosslinking to a 
psoralen derivative linked to biotin (Schleicher & Schuell). 

B) High Density Array Preparation 

j5 A high density array of 20 mer oligonucleotide probes was produced using 

VLSIPS technology. The high density array included the oligonucleotide probes as listed 
in Table 2. A central mismatcn control probe v.a5 provided for each gene-specific probe 
resulting in a high density array containing over 16,000 different oligonucleotide probes. 
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Table 2. High density array design. For ev€r>' probe there was also a mismatch control 
having a central 1 base mismatch. 





F 1 UUw 1 YpC- 




Number of Probes 




Test Probes: 


IL-2 


691 


5 




11. -3 


751 






TT A 


361 






IL-6 


691 






IL-10 


481 






IL-12p40 


911 


10 




GM-CSF 


661 






IFN-Y 


991 






TNF.fY 








mCTLA8 


391 






IL-11 receptor 


158 


15 


House Keeping Genes: 


GAPDH 


388 






6-actin 


669 




.-^ ^ J 


Bio B 


286 




preparation/amplification 








control) 






20 


The high density array was synthesized on a 


planar glass slide. 



C) Array Hybridization and Scanning, 

The RNA transcribed from cDNA was hybridized to the high density 

25 oligonucleotide probe array(s) at low stringency and then washed under more stringent 
conditions. The hybridization solutions contained 0.9 M NaCL 60 mM NaH2P04, 6 mM 
EDTA and 0.005 % Triton X-lOO , adjusted to pH 7 (S (referred to as 6x SSPFVV). In 
addition, the sciuticns contained 0.5 mg'irJ un'?^nt*Jep, (It^MrnckM l»t?uu*M sueui* DNA 
(Sigma Chemical Co., St. Louis, Missouri, USA). Pnor to hybndization, RNA samples 

'^0 were heated in the hybridization solution to 9 'C for iO minutes, placed on ice for 5 
niinutes, f^vA allowed to equilibrate at room temperature before being piaceu in the 
hybridization tlow ceii, Foiiowing hybridization, the soluuOii was iemoved, tlic airays were 
washed with 6xSSPE-T at 22 °C for 7 minutes, and then washed with 0.5x SSPE-T at 
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40 °C for 15 minutes. When biotin-labeied KNA was used, the hybridized RNA was 
stained with a streptavidin-phycoerythrin conjugate (Molecular Probes, Inc., Eugene, 
Oregon, USA) prior to reading. Hybridized arrays were stained with 2 fig/m! strcptavidin 
phycoer\thrin in 6xSSPE-T at 40 'C for 5 minutes. 

The arrays were read using scanning contccal microscope (Molecuiar 
Dynamics, Sunn>^ale, California, USA) modified for the purpose. Tne scanner uses an 
argon ion laser as the excitation source, and the emission was detected with a 
photomultiplier tube through either a 530 nm bandpass filter (fluorescein) or a 560 nm 
longpass filter (phycoerythrin). 

Nucleic acids of either sense or antisense orientations were used in 
hybridization experiments. Arrays with for either orientation (reverse complements of 
each other) were made using the same set of photolithographic masks by reversing the 
order of the photochemical steps and incorporating the complementary nucleotide. 

D) Quantitative Analysis of Hybridization Patterns and Intensities. 

The quantitative analysis of the hybridization results involved counting the 
instances in which the perfect matcn probe (FM) vvc^ brighter th- the corresponding 
mismatch probe (MM), averaging the differences (PM minus MM) for each probe family 
(i.e., probe collection for each gene), and comparing the values to those obtained in a 
side-by-side experiment on an identically synthesized array with an unspiked sample (if 
applicable). The advantage of the difference method is that signals from random cross 
hybridization contribute equally, on average, to the PM and MM probes while specific 
hybridization contributes more to the PM probes. By averaging the pairwise differences, 
the real signals add constructively while the contributions from cross hybridization tend to 
cancel 

in the averane of" the ditterence trM-ivim; 
values was interpreted by comparison wiili tiie icsuits ci :,pildiig expcn-c^-t?; as weii iV? 
signal obseri'ed for the internal standard bacterial RNA spiked into each sample at a known 
a:rT:cunt. Analy5'<! was performed usmg algontiims aiiu sufivviuC dcscr.bcd hsrem. 



E) Optimization of Probe Selection 
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In order to optimize probe selection for each of the target genes, the high 
density array of oligonucleotide probes was hybridized with the mixture of labeled RNAs 

transcribed from each of the target genes. Fluorescence intensity at each location on the 
high density array was determined by scanning the high densitj' array with a laser 
5 illuminated scanning confocal fluorescence microscope connected to a data acquisition 
system. 

Probes were then selected for further data analysis in a two-step procedure. 
First, in order to be counted, the difference in intensity between a probe and its 
corresponding mismatch probe had to exceed a threshold limit (50 counts, or about half 
10 background, in this case). This eliminated from consideration probes that did not hybridize 
well and probes for which the mismatch control hybridizes at an intensity comparable to 
the perfect match. 

The high density array was hybridized to a labeled RNA sample which, in 
principle, contains none of the sequences on the high density array. In this case, the 
15 oligonucleotide probes were chosen to be complementary to the sense RNA. Thus, an anti- 
sense RNA population should have been incapable of hybridizing to any of the probes on 
the crrny. \^^ere e**^?'' ? ^^obe or rnisnnatch showed a signal above a threshold value 
(100 counts above background) it was not included in subsequent analysis. 

Then, the signal for a particular gene was counted as the average difference 
20 (perfect match - mismatch control) for the selected probes for each gene. 

E) Results: The High Density Arrays Provide Specific and Sensitive Detection of 
Target Nucleic Acids. 

As explained above, the initial arrays contained more than 1 6,000 probes 
25 that were complementary to 12 murine mRNAs - 9 cytokines, 1 cytokine receptor, 2 

consiitutiveiy expiei>seu gtuca (S-actin and glyccraldchyde 3 phosphate dehyd»0£enn<;e) - i 
TT^T cAtoicine and 1 bdtie: -iil ge^^e {.r coH biot^n »synthetai:£, bioB) which server h 
quantitation reference. The initial experiments with these relatively simple arrays were 
dcsigr.wd to detsmiine ^^'heth**^ *:^ort situ syjnthesized oligonucleotides can be made to 
30 hybridize with sufficient sensitivity and spe^^f^rity to quantitativejv detect RNas in a 
complex cellular RNA population. These arrays were inieniionaliy higlily reduiiJaiu, 
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containing hundreds of oligonucleotide probes per RNA, man>' more than necessary' for the 
determination of expression levels. This was done to investigate the hybridization 
behavior of a large nuxsbsr of probes and develop geneia! sequence rules for a priori 
selection of minimal probe sets for arrays covering substantially larger numbers of genes. 
5 The oligonucleotide arrays contoined collections of pairs of probes for each 

of the RNAs being monitored. Each probe pair consisted of a 20-mer that was perfectly 
complementary (referred to as a perfect match, or PM probe) to a subsequence of a 
particular message, and a companion that was identical except for a single base difference 
in a central position. The mismatch (MM) probe of each pair served as an internal control 
10 for hybridization specificity. The analysis of ?UMU pairs allowed low intensity 

hybridization patterns from rare RNAs to be sensitively and accurately recognized in the 
presence of crosshybridization signals. 

For array hybridization experiments, labeled RNA target samples were 
prepared from individual clones, cloned CDNA libraries, or directly from cellular mRNA 
15 as described above. Target RMA for array hybridization was prepared by incorporating 
fluorescently labeled ribonucleotides in an in vitro transcription (IVT) reaction and then 
randomly fragmenting tne kNA to an average size of 30 - m bases Samoles were 
hybridized to arrays in a self-contained flow cell (volume -200 tiL) for times ranging from 
30 minutes to 22 hours. Fluorescence imaging of the arrays was accomplished with a 
20 scanning confocal microscope (Molecular Dynamics). The entire array was read at a 
resolution of 11.25 ^m (~ 80-fold oversampling in each of the 100 x 100 jim synthesis 
regions) in less than 15 minutes, yielding a rapid and quantitative measure of each of the 
individual hybridization reactions. 

25 1) Specificity of Hybridization 

v., t-v^iliiste the specificity of hybridization, ilie liigh density- array 

!.-_ c.-^ - -'SJ w;r.i.,.-; of IT .-2. IL 3. IL-^, 

described above w?s nyonaizxiu wiui p*»» -i- " ^ 

XL 6, Actin. GAPDH and Bio B or 11-10, IL-12p40, GM-CSF, IFN-y, TNF-a, mCTLAS 
and Bio B The hybridized array snowed suoi.k ^H-if;c sigr.alE for "-H of the test tareet 
30 nucleic acids with nuiiimal cross hybndization. 
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2) Detection of Gene Expression levels in a complex target sample. 
To determine how well individual RNA targets could be detected in the 

presence of total mammalian ceil message populations, spiking experiments were carried 
out. Known amounts of individual RNA targets were spiked into labeled RNA denved 
5 from a representative cDNA librar>' made from the murine B cc^I) line TIO. llie Tl 0 cell 
line was chosen because of the c>tokines being monitored, only IL-10 is expres;sed at a 
detectable level. 

Because simply spiking the RNA mixture with the selected target genes and 
then immediately hybridizing might provide an artificially elevated reading relative to the 

10 rest of the mixture, the spiked sample was treated to a series of procedures to mitigate 

differences between the library RNA and the added RNA. Thus the "spike" was added to 
the sample which was then heated to 37°C and aimealed. The sample was then frozen, 
thawed, boiled for 5 minutes, cooled on ice and allowed to return to room temperature 
before performing the hybridization. 

15 Figure 2A shows the results of an experiment in which 13 target RNAS 

were spiked into the total RNA pool at a level of 1 :3000 (equivalent to a few hundred 
copies pe'" cell). RNA frequencies are given as the molar amount of an individual RNA 
per mole of total RNA. Figure 2B shows a small portion of the array (the boxed region of 
2 A) containing probes specific for interleukin-2 and interleukin-3 (IL-2 and IL-3,) RNA, 

20 and Figure 2C shows the same region in the absence of the spiked targets. The 

hybridization signals are specific as indicated by the comparison between the spiked and 
unspiked images, and perfect match (PM) hybridizations are well discriminated from 
missmatches (MM) as shown by the pattern of alternating brighter rows (corresponding to 
PM probes) and darker rows (corresponding to MM probes). The observed variation 

25 among the different perfect match hybridization signals was highly reproducible and 
retlects the sequence depeuuciiCt; of the hybridizations. In z few inslarices, the perfect 
rr.rr{-h i*PM; nrcbe wab iioi si^ifiC5i\MV hriphtcr th^m its mismatch (MM) partner b€c?ii'*itf 
of cross-hybridization with other members of the complex RNA population. Because the 
pattczr^G arc highly reprediic'^^f f^^d because detection does not depend on only a smgie 

30 piolic per RNA, infrequent cross hybridization of this type dia not preclude sensitive and 
accurate detection of even low level RNAS. 



• 
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Similarly, infrequent poor hybridization due to, for example, RNA or probe 
secondar>' structure, the presence of polymorphism or database sequence errors does not 
precrade detection. Ar. analysis of tiie observed patterns of hybridization and cross 
hybridization led to the formulation of general rules for the selection of oligonucleotide 
5 probes with the best sensitivit>' and specificity described herein. 

3) Relationship between Target Concentration and Hybridization Signal 

A second set of spiking experiments was carried out to determine the range 
of concentrations over which hybridization signals could be used for direct quantitation of 
10 RNA levels. Figure 3 shows the results of experiments in which the ten cytokine RNAs 
were spiked together into 0.05 mg/ml of labeled RNA from the B cell (TIO) cDNA library 
at levels ranging from 1 :300 to 1 :300,000. A frequency of 1 :300,000 is that of an mRNA 
present at less than a few copies per cell. In 10 ^g of total RNA and a volume of 200 nl, a 
frequency of 1 -.300,000 corresponds to a concentration of approximately 0.5 picomolar and 
15 0. 1 femptomole (-6x10' molecules or about 30 picograms)of specific RNA. 

Hybridizations were carried out in parallel at 40 "C for 15 to 16 hours. The 
presence of each of tne lO .yioki... TO.'As v.-- repred-cihiy detected above the 
background even at the lowest frequencies. Furthermore, the hybridization intensity was 
linearly related to RNA target concentration between 1 :300,000 and 1 :3000 (Figure 3). 
20 Between 1 :3000 and 1 :300, the signals increased by a factor of 4 - 5 rather than 1 0 because 
the probe sites were beginning to saturate at the. higher concentrations in the course of a 15 
hour hybridization. The linear response range can be extended to higher concentrations by 
reducing the hybridization time. Short and long hybridizations can be combined to 
quantitatively cover more than a lO'-fold range in RNA concentration. 
25 Blind spiking experiments were performed to test the ability to 

^ ; ,.!., ,1.=;.—* — ' n:;ar.fitr.te multiole rciated KNAs present at a v;idc range of 

.4V-1 ,.1 * vji».ilF; vvss srcc2rsd that 

cor.ccntrations in a complex rus/-i k^k- *• - = ' '" -— ^- • ■ 

contained 0 05 mg/ml of sense RNA transcribed from the murine B cell CDNA library, 
plus co->in?.tions of the 10 cytoKine RNA. taJ. »t a different c--ntration. Individual 
30 cytokine RNAs we. e spiked at on^ of the foilowin, levels: 0, 1:300,000. i -^0 on(h 1 :3UuO, 
1 :300. The four samples plus an unspiked reference were hybridized to separate arrays 
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for 1 5 hours at 40°C. The presence or absence of an RNA target was determined by the 
pattern of hybridization and how it differed from that of the unspiked reference, and the 

concentrations were detected by the intensities. The concentrations of each of the ten 
cytokines in the foui' blind samples were correctly determined, with no false positives or 
false negatives. 

One case is especially ROtewonhy: IL-10 is expressed m the mouse 3 cells 
used to make the CDNA library, and was knowna to be present in the library at a frequency 
of 1 :60,000 to 1 :30,000. In one of the unknovms, an additional amount of IL-10 RNA 
(corresponding to a frequency of 1 :300,000) was spiked into the sample. The amount of 
the spiked IL-10 RNA was correctly determined, even though it represented an increase of 
only 1 0 - 20% above the intrinsic level. These results indicate that subtle changes in 
expression are sensitively determined by performing side-by-side experiments with 
identically prepared samples on identically synthesized arrays. 

Example 2 

T Cell Induction Experiments Measuring Cytokine mRNAs as a Function 

of Time Following Stimulation. 

The high density arrays of this invention were next used to monitor cytokine 
MRNA levels in murine T cells at different times following a biochemical stimulus. Cells 
fi-om the murine T helper cell line (2D6) were treated with the phorbol ester 
4-phorbol-12-myristate 13-acetate (PMA) and a calcium ionophore. Poly (A)' MRNA was 
then isolated at 0, 2, 6 and 24 hours after stimulation. Isolated mRNA (approximately 1 
Hg) was converted to labeled antisense RNA using a procedure that combines a 
double-stranded cDNA synthesis step with a subsequent in vitro transcription reaction. 
Tnis RNA synthesis and labeling procedure amplifies the entire mRNA population by 20 
to 5y-*clC in an appareniiy uribi^ibeti >*'Hi i^ovoducibic fashion (Tabiw 2). 

The iabeleu armscnsc T-ccU RNA from the four time points \V2£ then 
hybridized to DNA probe arrays for 2 and 22 hours. A t^rp^ increase in Ihe v -interferon 
uiRTN A Icvc! was cbDcr.'cd, along wth sigr^ific^'^r r>^-nops in four other c\lckine mRNAs 
IL-10, GM-CSF and TH^a) As shown m higure 4, the cytokine messages wete nut 
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induced with identical kinetics. Changes in cytokine mRNA levels of less than 1 : 1 30,000 
were unambiguously detected along with the very large changes observed for Y-interferon. 

These results highlight the value of the large experimental dynamic range 
inherent in the method. The quantitative assessment of RKA levels from the hybridization 
5 results is direct, with no additional control hybridizations, sample manipulation, 
amplification, cloning or sequencing. The method is also efficient. Using current 
protocols, instrumentation and analysis software, a single user with a single scanner can 
read and analyze as many as 30 arrays in a day. 

10 Example 3 

Higher-Density Arrays Containing 65,000 Probes for Over 100 Murine 

Genes 

Figure 5 shows an array that contains over 65,000 different oligonucleotide 
probes (50 ^m feature size) following hybridization with an entire murine B cell RNA 
15 population. Arrays of this complexity were read at a resolution of 7.5 lim in less than 
fifteen minutes. The array contains probes for 118 genes including 12 murine genes 
represented on the simpler array described above, 35 U.S.C. §102() additional murine 
genes, three bacterial genes and one phage gene. There are approximately 300 probe pairs 
per gene, with the probes chosen using the selection rules described herein. The probes 
20 were chosen from the 600 bases of sequence at the 3' end of the translated region of each 
gene. A total of 21 murine RNAs were unambiguously detected in the B cell RNA 
population, at levels ranging from approximately 1:300,000 to 1:100. 

Labeled RNA samples from the T cell induction experiments (Fig. 4) were 
hybridized to these more complex 1 1 8-gene arrays, and similar results were obtained for 
the set of geiiei iii ccmmcn to boLh rHip types. Expression changes were unambiguously 
f^v v^orc than 20 ether genes audition to these shcv.'n in ri^i^ 4 

To determine whether much smaller sets of probes pci- ^cnc arc sufficient 
for reliable d-^e''*'nn of RNAs, hvbndization resuiis fium the 1 1 8 gene chip v/ere analy^i^d 
using ten different sub^^if-ts of 20 proDe pairs pet gene. That is to cay, the dats rvere 
30 analyzed as if the arrays contained omy 20 probe pairs per gene. The ten subsets of 20 
pairs were chosen from the approximately 300 probe pairs per gene on the arrays. The 
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initial probe selection was made utilizing the probe selection and pruning algorithms 
described above. The ten subjects of 20 pairs were then randomly chosen from those 
probes thai survived selection and pruiiing. Labeled RNAs were spiked into the murine B 
cell R\ A population at levels of 1 :25,000, 1 :50.000 and I : i 00,000. Changes in 
hybridization signals for the spiked RNAs were conKistenUy detected at all three levels 
with the smaller probe sets. As expected, the hybridization intensities do nut cluster as 
tightly as when averaging over larger numbers of probes. This analysis indicates that sets 
of 20 probe pairs per gene are sufficient for the measurement of expression changes at low 
levels, but that improvements in probe selection and experimental procedures will are 
preferred to routinely detect RNAs at the very lowest levels with such small probe sets. 
Such improvements include, but are not limited to higher stringency hybridizations 
coupled with use of slightly longer oligonucleotide probes (e.g., 25 mer probes)) are in 
progress. 

Example 4 
Scale Up to Thousands of Genes 

A set 01 tour mgn densiiy iiira> i cauh contciiiiirig IS-rr.zi cligcnuclectrde 
probes approximately 1650 different human genes provided probes to a total of 6620 
genes. There were about 20 probes for each gene. The feature size on arrays was 50 
microns. This high density array was successfully hybridized to a cDNA library using 
essentially the protocols described above. Similar sets of high density arrays containing 
oligonucleotide probes to every known expressed sequence tag (EST) are in preparation. 

Example 5 

Direct Scale up for the Simultaneous Monitoring of Tens of Thousands 

of RNAs. 

In addition to being sensitive, specific and quantitative, ttie approach 
/4oc/-fiKAH Here \^ intrinsically parallei and readily scalable to the monitoring of veiy laige 
numbers of mP-N A<; The number of RNAs monitored can be incieaieu gicall^ hy 
decreasing the number of probes per RjnA and increasing tlie number of probes per array. 
For example, using the above-described technology, arrays containing as many as 400,000 
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probes in an area of 1.6 cm= (20 x 20 ^im synthesis features) are currently synthesized and 
read. Using 20 probe pairs per gene allows 10.000 genes to be monitored on a single airay 
while maintaining the important advantages of probe redundancy, A set of four such 
arrays could cover the more than 40,000 human genes for which there a:e expressed 

5 sequence tags (ESTS) in the public data bases, and new ESTs can be incorporated as they 
become available. Because of the combinatonai nature of the chemical svTxthesis. arrays of 
this complexity are made in the same amount of time with the same number of steps as the 
simpler ones used here. The use of even fewer probes per gene and arrays of higher 
density makes possible the simultaneous monitoring of all sequenced human genes on a 

10 single, or small number of small chips. 

The quantitative monitoring of expression levels for large numbers of genes 
will prove valuable in elucidating gene function, exploring the causes and mechanisms of 
disease, and for the discovery of potential therapeutic and diagnostic targets. As the body 
of genomic information grows, highly parallel methods of the type described here provide 

15 an efficient and direct way to use sequence information to help elucidate the underlying 
physiology of the cell. 

Example 6 
Probe Selection Using a Neural Net 

A neural net can be trained to predict the hybridization and cross 
hybridization intensities of a probe based on the sequence of bases in the probe, or on other 
probe properties. The neural net can then be used to pick an arbitrary number of the "best" 
probes. When a neural net was trained to do this it produced a moderate (0.7) correlation 
between predicted intensity and measured intensity, with a better model for cross 



20 



25 liybridizaticn thsn hybridization. 



A) input/Output Mapping. 

The neural net was trained to identiiy the hybridization properties of 20-mer 
probes. The ^.O-mer pro'oe. were -.^pped to - e^ohty bit lone input vector, witn the lU.. 
30 four bits representing tl->c base in the f-s. position of the p. obc, the next four bits 
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representing the base in the second position, etc. Thus, the four bases were encoded as 
follows: 

A: 1000 C: 0100 G: 0010 T: 0001 

The neural network produced two outputs; hybridization intensity, and 

crosshybridization intensity. The output was scaled linearly so that 95% of the outputs 

from the actual experiments fell in tlie range 0. to 1 . 

B) Neural Net Architecture. 

The neural net was a backpropagation network with 80 input neurons, one 
hidden layer of 20 neurons, and an output layer of two neurons. A sigmoid transfer 
function was used: ( s(x) = 1/(1+ exp(-l * x)) ) that scales the input values from 0 to 1 in a 
non-linear (sigmoid) marmer. 

C) Neural Net Training. 

The network was trained using the default parameters from Neural Works 
Professional 2.5 for a backprop network. (Neural Works Professional is a product of 
Nwt^^IWorc, Pittsb'^s^ Pennsy!v?*nia IJSAV The training set consisted of approximately 
8000 examples of probes, and the associated hybridization and crosshybridization 
intensities. 



D) Neural Net Weights. 

Neural net weights are provided in two matrices; an 8 1 x 20 matrix (Table 
3) (weights_l) and a 2 x 20 matrix Table 4 (weights__2). 

Table 3. * Neural net weights (81 x 20 matrix) (weights_l). 
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-0.0635492 
0.18790121 
0.02378313 
-0.0403537 
-0.06940,5 1 

-0.0731941 

0.06500423 

0.03036973 

0.17097448 

0.05442215 
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0.08829379 

0.0749867 
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0.08858298 
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0.16058824 

0.2882407 
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0.24270254 

0.30585453 
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0.1234024 

0.98782647= 
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0.10313182 

0.37975076 

0.00025528 

0.50065184 

0.03180836 

0.75752454 

0.03615654 

0.3820785 

-0.0761788 

0.01620384 

-0.316401 

-0.2450078= 



0.51860482 

-0.2268714 

0.17145836 

-0.4994924 

0.20251468 

-0.76558 II 

0.37193877 

-0.6195157 

0.47984859 

-0.5177853 

0.31378582 

-0.203447 

0.21313109 



-0.1440069 
-0.1994763 
-0.0149997 
-0.7120748 
0.06202703 
-0.7413587 
0.04010354 
-C.S1SSSS2 
-0.1248241 
-0.5642462 
-0.0242514 
-0.1306533 
0.09779498 



0.19151463 

0.31717882 

0.32802406 

0.75497276 

0.39860719 

0.7155844 

0.47959387 

0.80366057 

0.33738744 

0.36228263 

0.48470935 

0.25734761 

0.12461348 



0.05522444 
0.12304886 
0.47659361 
-0.1078557 
0.57867163 
-0.0193744 
0.82366729 
n n/j538p22 
0.56647652 
-0.0609947 
0.35473567 
-0.1468564 
0.08537519 



-0.1127352 

0.3,^736522 

-0.3898261 

0.35112098 

-0.7198414 

0.39701831 

-0.9032337 

0.3884458 

-0.5510914 

0.40129057 

-0.2453159 

0.17168433 

0.10632347 



-0.0711868 
-0.1611445 
-0.4639786 
0.10635795 
-0.6733171 
-0.1180785 
-0.6429569 
-0.1471086 
-0.6294683 
-0.0350918 
-0.3512402 
0.25235301 
-0.0738487 



45 



-0.1147067 


-0.0084124 


-0.5239977 


-0.5021591 


0.02636886 


0.1470097 


-0.5139894 


-0.6221746 


-0.3979228 


0.30136263 


-0.742976 


-0.4011821 


0.19038832 


0.55414283 


-1.1652025 


-0.3686967 


-0.4750175 


0.54713631 


-0.9312411 


-0.410718 


-0.1498093 


0.55332947 


-1.0870041 


-0.4378341 


-0.5433689 


0.92539561 


-0.9013531 


-0.6145319 


-0.5512772 


1.0310978 


-o.y4>y, /vs 




-o. / o-/ > * 1^ 




-0 7092296 


-0.894987 


.0 6RQr>;5.5 


1.1 2J^ i () 1 ! 


-0.5 iOOlM') 


-u.o^v-rvU*- 




1. 331 5079 


■ 1.023 n*^:^ 


-0.5556009 








-G .656201 4 


-0.6568274 


1.1967098 


-1.150661 


-0.5503616 


"0.6640182 


0.S469S49S 


-0.7811472 


-0.3740913 


-0.4527726 


0.649 '1795 


-0.6970047 


■0.5759697 


-n 4704399 


0.51728982 


-U.345236 


-U.OJ 1 iv/^i 




n 371^7478 


-U. / /JJOJt 


n -> A-) 1 n07 


-0.408 3 HQ? 


-().tH52(>»3 


A <^ <•« o *^ r> 

-U.zj:>t;o / o 




-0. i 544528 




-0.8989772 


-().<0S8974 




A ^ 1 ^n^ni^ 


-0.4815812 


-0.5319371 


-1.3798244= 
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0.07143499 

0.1549352 

0.44703272 

0.2595928 

0.53066176 

0.1702383 

0.5403164 

-0.0922Q8 

0.22238699 

-0.180493 

0.17421109 

-0.1982318 

0.05979542 

-0.1978694 

0.06230025 

0.1073643 

-0.0272076 

0.33091745 

0.01069087 

0.11231339 

-0.0213237 

0.12980145 

0.15833771 

-0.0696303 

0.05538817 

-0.0677462 

0.14652038 

-0.1929855 

-0.0786668 

-0.0029815 

-0.1259448 

-0.1091328 

0.00333312 

0.14768517 

0.0611263 

0.099518.59 

0.05554885 



-0.0365961 
0 01934035 
•0.0525739 

0.19904579 
0.01703933 



-0.1589592 

-0.0608833 

-0.6194252 

-0.119705 

"0.9705743 

0.02221953 

-0.5077381 

0.21902563 

-0.156256 

0.17164391 

-0.0730809 

0.06996673 

-0.0623277 

0.05119598 

-0.0752745 

-0.090154 

-0.1014201 

-0.0610701 

0.02569587 

-0.0392407 

-0.0261696 

-0.038394 

0.01835199 

0.03802699 

0.01067943 

-0.0772208 

0.06084725 

0.00694158 

0.05454836 

-0.0837616 

-0.0845026 

0.0090488 

-0.2812204 

0.02989549 

-0-1895157 

0.14843601 

-0 3743191 

0.09599103 

-0.0962418 
-0.0073082 
0.060S6259 

-0.2001437 
0.06875326 



0.04816094 

0.21059546 

0.19459446 

0.4913742 

0.1324198 

0.44412452 

0.00849557 

0.25788471 

-0.2092034 

0.15690604 

-0.3717274 

0.19735655 

-0.2521037 

-0.2067173= 

0.32974288 

-0.0938452 

0.19723812 

0.01335303 

0.11676744 

0.06117272 

0.09474246 

0.08167668 

0.04420554 

0.0806741 

0.04131892 

0.16641215 

-0.1150111 

0.26604816= 

-0.083471 1 

0.02468397 

0.10171869 

0.06142418 

0.02039073 

0.09454407 

0.08583955 

0.12351749 

0.0205463 

0.0570596 
ft m 1 <:i7/:o 

0.01007566 
-0.0489736 
-0 l788n6Q= 



-0.0301291 

-0.4705076 

-0.0523894 

-0.8455008 

0.0898297,1 

-0.7700244 

0 1611405 

-0.3861519 

0.16458821 

-0.0254563 

0.1436436 

0.05625506 

0.0944353 



0.00985043 

0.00704324 

-0.0935401 

0.02156818 

-0.0213131 

-0.0234323 

-0.0100756 

-0.0105376 

0.02605363 

0.03993953 

-0.0267609 

0.09142463 

-0.0687876 



0.07707115 

0.03531792 

-0.0541042 

-0.167912 

-0.052828 

-0.1860176 

0.09382812 

-0.1327625 

0.12675567 

-0.152338' 

0 09737793 

-0.0049753 
0.1045/312 



0.04977471 0.26628217 
0.09066898 -0.2003548 



0.15144217 

0.16360784 

0.31194624 

0.15694356 

0.4390Q672 

0.10496679 

0.31764683 

-0.2022993 

0.20111787 

-0.1990184 

-0.0215865 

-0.241524 

-0.0492548 



0.07881941 

0.2569764 

0.0913924 

0.21619918 

0.1322203 

0.14693312 

0.10580003 

0.02142166 

0.27427858 

-0.0121658 

0.14418064 

0.02115551 

0.10878915 



0.05659099 

-0.1437671 

0.05257236 

-0.098868 

-0.0439769 

-0.0505908 

-0.0001466 

0.10949049 

00//S8UI 

0 08184241 

0.0V08>! 1 6V 

0.01404589 

-0.0520154 



-0.3037405 
-0.0684895 
-0.8030509 
-0.0023983 
-0.8588745 
0.14137991 
-0.5240273 
0.13711917 
-0.1418906 
0.10211211 
-0.2363243 
0.12768924 
0.05238663 



-0.0835249 

0.08700065 

-0.0728388 

-0.0909865 

0.11848255 

0.13509636 

-0.0147534 

-0.0161705 

0.05774866 

0.07568218 

0.0897231 

-0.0876383 

0.32776353 



-0.0285798 

0.10122854 

0.04065102 

0.02574896 

-0.0458286 

0.088718 

-0.4065202 

0.07129322 

-0. i S69074 

0.C07C4I22 

-0.0406134 
-0.0454775 



0.26507998 0.0629771 
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0.39202845 

0.65535748 

0.95144385 

0.99852085 

1.2572207 

0.73526824 

0.95438999 

0.4S917389 

0.33946255 

0.19083619 

0.30190364 

0.18584418 

0.13698889 

0.31540671 

0.00119518 

0.74023747 

0.06987014 

0.7840901 

0.05702339 

0.70519674 

0.11747536 

0.81293154 

0.24770954 

0.54755467 

0.03049339 

0.1008145 

-0.0048454 

-0.0273505 

0.1325469 

-0.0007111 

-0.118853 

0.07947435 

-0.162177 

0.04106503 

-0.0012895 

o 1 1 

-0.142963 
u. 1990339'; 
-0.0027455 
0.25000233 



-0.6033413 

0.32430753 

-1.2075449 

0.48870567 

"1.5854638 

0.31977594 

-1.2543333 

0.27823627 

-0.5412283 

0.37056214 

-0.3655235 

0.34009755 

-0.0798945 

0.08274947 

-0.1978176 

0.38564634 

-0.5168169 

0.4372991 

-0.5161278 

0.15731441 

-0.612968 

0.18651071 

-0.4320194 

-0.1913544 
0.01412579 

0.1204864 

0.10494121 

0.15324508 

0.13285491 

0.26435438 

0.07329605 

0.18712705 

0.08498254 

0.2371086 

0. 107563 3 *^ 

G.C29S95S9 
0.16604523 
0.05931767 



0.57940209 -0.0460919 



0.64831889 
0.94851351 
1.7470727 
0.89351815 
1.2270083 
0.55854511 
0.26928344 
0.1085042 
0.24114503 
0.33355939 
4.5490937= 

0.3366704 

0.11212139 

0.59532708 

0.03748908 

1.0081589 

0.13783893 

0.66693234 

0.08724558 

0.98160452 

0.03182137 

0.72470272 

0.22105552 

0.4782092 

0.42727205= 

0.15507312 

0.1988914 

-0.01398 

-0.1658676 

-0.0775707 

-0.0903666 

0.03216886 

-0.0325038 

0.14713244 

-0.0486093 



-1.0950515 

-0.0852669 

-1.7586045 

0.39586932 

-1.2818555 

0.1672449 

-0.9804664 

0.44658452 

-0.3020035 

0.44246852 



0.17313539 

-0.428847 

-0.0309942 

-0.6475483 

-0.0517421 

-0.8574924 

-0.0496743 

-0.7325026 

0.02407174 

-0.7051651 

0.12951751 

-0 3489864 

-0.098419 



0.25648347 

0.09454013 

0.08281901 

0.25348473 

0.09143513 

0.10754076 

0.04698242 

0.29328787 

-0.053306 

0.05799349 

0.05942665 



0.53419203 

0.80829531 

0.94320357 

0.56886804 

1.586942 

0.71813524 

0.56084049 

0.62299174 

0.39120093 

0.39015424 

0.17172456 



0.01228174 

0.57447821 

-0.0107875 

0.87958473 

0.08651814 

0.90612286 

0.07689167 

0.65517086 

0.02613025 

0.89682412 

0.14626819 

0.4620938 

-0.0160188 



-0.7680888 

0.05049393 

-1.680338 

0.66196042 

-1.63657V5 

0.37488377 

-0.7980669 

0.53984308 

-0.5676367 

0.09788869 

-0.3479928 



-0.2679709 

-0.0305296 

-0.7312108 

0.05327692 

-0.761238 

0.06334394 

-0.5775976 

0.29064488 

-0.677594 

0.181806 

-0.3964331 

0.06516677 

0.07177288 



0.06245366 -0.0775013 



0.03982652 

-0.0560908 

0.07909692 

0.08835109 

-0.1019902 

0.04456592 

-0.0385783 

0.01249749 

-0.0808243 

0.21323961 

^0 143813 



0.14641231 

0.07466536 

0.36858437 

0.16466415 

0.29236633 

0.18368921 

0.2276271 

0.10016124 

0.28909287 

-0.0118695 

0.2167SS24 

0 jn*67f»48 

0.21550164 



45 



0.04679342 
-0.1704439 
-0.215752 



0.10158926 

0.302394 

0.32740423 



-0.122116 

-0.0671487 

-0.1597161 



0.23491009 
0.33251444 
0.18950906 



-U.05817O5 
-0.1232446 



0.21095584 
0.27883759 



FXF^nC":) <W0_ _cf727317A1 t > 



wo 97/27317 PCT/US97/01603 



10 



-0.0430407 

-0.1322077 

0.10109599 

"0.0808031 

0.13912162 

-0.2270383 

0.01596376 

-0.1284984 

0.00538179 

-0.0861699 

0.20031671 



0.04886867 

0.2981362 

0.23081669 

0.15750171 

0,047.56131 

0.22945035 

0.03504543 

0.24145114 

0.05302088 

0.05814215 

0.23140682 



-0.0914212 

0.1254565 

-0.1617257 

0.08072432 

•■0.1625126 

0.18167619 

0.00964208 

0.20540115 

-0.1001294 

0.21307872 

0.16010799= 



0.28192514 
0.15627012 
0.29508773 
0.12990661 
0.25232118 
0.00080986 
0.11757879 
0.07580803 
0.27505419 
0.01372274 



0.05275658 
0.04116358 
-0.0405337 
-0.1935954 
G.04736055 
-0.1253632 
•0.0230768 
-0.0932236 
0.22654785 
0.04515802 



0.21014904 
0.08507752 
-0.0497829 
0.2912066:^ 
-0.053Q935 
0.15695702 
0.04350457 
0.14288881 
0.02395938 
-0.0269269 





0.37838998 


0.00934576 


-0.139213 


0.29823828 


0.40640026 


-0.067578 




-0.038453 


0.24550894 


0.30729383 


-0.2807365 


-0.0689575 


0.26537073 


15 


0.58336282 


-0.2145292 


-0.2378269 


0 25939462 


0 64761585 


-0 3581158 




0.07741276 


0.45081589 


0 65251595 


-0.4543131 


-0 0671543 


0 48592216 




0.85640681 


-0.6068144 


-0.1187844 


0 35959438 


0.71842372 


-0.7140775 




-0.0642752 


0.37914035 


0.71409059 


-0.7180941 


0.21169594 


0.27888221 




0.79736245 


-0.7102081 


0.14268413 


0.41374633 


0.75569016 


-0.7394939 


20 


0.02592243 


0.37013471 


0.82774776 


-0.8136597 


0.24068722 


0.45081198 




0.88004726 


-0.6990998 


0.23456772 


0.24596012 


0.67229778 


-0.8148533 




0.30492786 


0.39735735 


0.55497372 


-0.6593497 


0.20656242 


0.3752968 




0.54989374 


-0.5660355 


0.1205707 


Q22Z1119S 


0.46045718 


-0.519361 




0.17151839 


0.39539635 


0 50465524 


-Q 3791285 


0 07184427 


0 36315975 


25 


0 51068121 


-0.3502096 


-0 2094818 


0 31471297 


0 18174268 


-0 1241962 




-0.1255455 


0.35898197 


0.79502285= 










0 02952595 


-0 0751979 


-0 2556099 


-0 3040917 


-0 0942183 


-0 0541431 




-0.6262965 


-0.1423945 


-0.0537339 


0.11189342 


-0.3791296 


-0.3382006 


30 


0.02978903 


0.20563391 


-0.5457558 


-0.3666513 


-0.1922515 


0.29512301 




-0.7473708 


-0.0415357 


0.18283925 


0.28153449 


-0.7847292 


-0.2313099 




0.00290797 


0.6284017 


-0.6397845 


-0.5606785 


-0.1479581 


0.57049137 




-1.0829539 


-0.1822221 


-0.1832336 


0.49371469 


-0.6362705 


-0.2790937 




0.06966544 


0.75524592 


-0.9053063 


-0.5826979 


-0.114608 


0.90401584 


35 


-0.8823278 


-0.3404879 


-0.0334436 


0.50130409 


-0.57275 


-0.3842527 




0.0915129 


0.44590429 


-0.7808504 


-0.4399623 


-0.1189605 


0.59226018 




-0.499517 


-0.4873153 


-0.2889721 


0.47303999 


-0.4015501 


-0.2875251 




0.1106236 


0.27437851 


-0.6061368 


-0 4166.524 


-0.0637606 


0.33875695 




0 6255118 


-0.1046614 


-0.2710638 


0 >'6425925 


-0.4125208 




^0 


0. ! 468 i 92 


-0.i7i9S56 


-0 4140109 




0.02873472 






. fl T 1 


•0.1335077 


-0.7155944 










0.06424081 


-0.09/8306 


-0.1169782 


0.13909493 


-0.CS38893 


-0.1300299 






0 1 1 563963 


-0.0709175 


-0.028875 


-0.1718288 


-0.026291 




0.uSS3336i 


-0.033985 
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0.05850215 


0.03830531 


-0.0893732 






0.13403182 




-0.012636 


-0.1925185 


0.13028348 


-0.0045112 


0.05260766 


-0.2759708 
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-0.0395793 
-0.0917266 
0.23327024 
0.10926479 
0.12219627 
-0.1091286 
-0-0210903 
-0.1233738 
0.06584878 



0.03069885 

-0.2185763 

-0.0898143 

-0.1167006 

0.05705986 

-0.075133 

0.11607172 

-0.0760847 

-0.0323083 



0.07913893 
0.04743406 
-0.C578982 
0.18223672 
-0.0505442 
0.02949276 
• 0.0943146 
0.00098273 



-C.I 470363 
-0.0364127 
-0.2096201 
0.09710353 
-0.1334345 
-0.0217044 
-0.1014408 
0.07522969 



-0.0581293= 



0.09080192 

0.00991712 

0.09257686 

0.03838636 

-0.0204458 

-0.0782921 

0.02903902 

0.05794976 



0.1974II3I 
-0.2093729 
0.00566842 
"0.2026017 
0.0 1 167099 
-0.1160332 
0.02963065 
-0.1959872 
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Table 4, Second neural net weighting matrix (2x21) (weights_2). 
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-0.5675537 
-0.5328685 
-0.209518 
2.0453076 

0.55343837 
0.12712023 
0.19891988 
3.1242371 



-0.6119734 
0.31165671 
1.6362301 
0.08412334 

0.68506879 
-1.7462951 
-4.0000067 
0.22860088 



0.20069507 
-0.9999997 
-1.9999975 
-0.1645829= 

-1.1869608 
0.0818732 
-0.5605077 
1.6726165= 



0.26132998 
-0.4128213 
-0.2563241 



0.39551663 

6.111361 

1.3601962 



-0.5071653 
-1.0000007 
0.04389827 



0.38050765 
0.62210494 
1.7318885 



0.2793434 

-0.6456627 

1.7597554 



0.40832204 
0.42921746 
-1.0558798 



F.\ Code ff fr running the net. 

Code for running the neural net is provided below in Table 5 (neural_n.c) 

and Table 6 (lin_alg.c). 



30 



35 



40 



45 



Table 5. Code for running the neural net (neural_n.c). 



#define local far 
#include <windows.h> 
#include <alloc.h> 
#include "utils.h" 
#include <string.h> 
^include <otypc.h> 
#inciude •^ s*dic.h> 
^include <niau».ii> 
#include <mem.h> 
^include "dc3_uu!.h" 
#include "cmpwin.h ' 

"Kn alo h" 



void reportProblem{ char local * message, short errorClass); 
char iniFileName[] = "designer.ini"; 
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10 



15 



20 



25 



30 



35 



4'* 



static void sigmoid( vector local * transformMe ){ 
short i; 

for( i = 0; i < traiisformMe->size; i-H- ) 



tmnsformMe->valucs[i] l/(1-exp(-l * iransfo:mMe->values[i])); 



static short getNuTnCols(char far * buffer) { 
short count = 1 ; 
for( ;*buffer != 0; bufferHH ) 

if( *buffer = ^t') count-Hf; 
return count; 



static short getNuinRows(char far * buffer) { 
char far * last, far ♦ current; 
short count = -1; 
current = buffer; 
do{ 

count+-f; 
last = current; 

current = strchr( last+1, 0 ); 
}v^hiie( current > last+l ); 
retxim count; 



static void readMatrix( matrix local * theMat, char far ^ buffer ){ 
short ij; 
char far * temp; 
temp - buffer; 

for( i = 0; i < theMat->numRows; i-H- ){ 



for(j = 0; j < theMat->numCols; ){ 

while( isspace( *temp ) || (*temp = 0 && *(temp-l) != 0 ) ) = temp-t-+; 
sscanf( temp, "%f' , &theMat->values[i]|j]); 
while( !isspace( *temp ) && ♦temp != 0) temp++; 

} 



short readNeuralNetWeights(matrix local *weightsl, matrix local *weiglu^2 
\/ 

char far * buffer: 
im copiedLength; 
short numCols, numRows; 
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buffer = farcalloc( MaxNumLines * MaxLineSi^, sizeof( char ) ); 
if (buffer = NULL ){ errorH\%Tid( "failed to allocate file reading = buffer"); return 
FALSE;} 

copiedLength - GetPrivateProfiicStrin:g{"weights_r\ NULL, "\0\0", buffer, 
5 MaxNumLines * MaxLineSize, iniFileName); 

if( copiedLength < 10 1! copiedLength >= {MaxNumLines * MaxLineSize - 

-10)){ 

erTorH\\Tid('Tailed to read .ini file"), rerom FALSE; 

10 numCols = getNumCois( buffer ); 

numRows getNumRows( buffer ); 

if( !allocateMatrix( weightsl , numRows, numCols )) return FALSE; 
readMatrix( weightsl, buffer ); 

15 copiedLength = GetPrivateProfileString("weights_2", NULL, "\0\0", buffer, 

MaxNumLines * MaxLineSize, inlFileName); 

if( copiedLength < 10 || copiedLength >= (MaxNumLines * MaxLmeSize 

-10)){ 

errorHwnd("failed to read .mi file"); 
20 farfree( buffer ); 

return FALSE; 

numCols = getNumCols( buffer ); 
numRows = getNumRows( buffer ); 
25 :f( l-llcczteM?.trix( weiphts2. numRows, numCols )){ farfree( buffer ); return 

FALSE; } 

readMatrix( weights2, buffer ); 
farfree( buffer); 
return TRUE; 

30 } 

short runForward( vector local * input, vector local * output, 

matrix local ♦weightsl, matrix local 

*weights2){ 
35 vector hiddenLayer; 

if( !allocateVector( &hiddenLayer, (short)(weightsl->numRows +1) )) return 

FALSE; ' 

if( ! vectorTimasMatrixC inpui, &hiddenl.aycr, weightsl ) ){ 

;q:^^\7^«(^^vv/ j^.h^.-^r^rr.T j:viir V :£turn FALSE: 



40 



sigmoid( &hiddenLayer ); 
hidden]. aver, values^ hiddenLayer.size -1] = 1; 
if( ;veaoiTlrviC3Nlatrix( £hJ^^f ^yer output, weiems2 ) ){ 
freeVeclori fehiddeiiLaycr ); rcrjm FALf;F: 

45 } 

freeVector( &hiddenLayer ); 
sigmoid( output ); 
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return TRUE; 

} 

static vector inputVector= {NULL, 0}, output Vector = {NULL, 0}; static matrix 
5 firstWeights = {NULL, 0, 0} , secondWeights = [NULL, 0, 0}; 

static short beenHereDoneThis = FALSE; 

static shon makeSureNetIsSetup( void ) { 
10 if( beenHereDoneThis ) return TRUE; 

if( !readNeuralNetWeights( &first Weights, &second Weights )) return = FALSE; 
if( !allocateVector( &inputVector, first Weights.numCols )) return = FALSE; 
if( ! allocate Vector( &outputVector, second Weights.numRows )) return = FALSE; 

1 5 beenH ereDoneThis = TRUE; 

return TRUE; 

> 

void removeNetFromMemory( void ) { 
20 fTeeVector( &inputVector ); freeVector( &outputVector ); 

freeMatrix( &firstWeights ); freeMatrix( &secondWeights ); 
beenHereDoneThis = FALSE; 

} 

25 short nnEstimateHybAndXHyb( float local * hyb, float local * xHyb, char = local * 
probe) { 

short probeLength, i; 

if( ImakeSureNetlsSetUpO) return FALSE; 
30 probeLength = (short)(strlen( probe )); 

if( (probeLength *4 + 1) != inputVector.size ){ 
// reportProblem("Neural net not set up to deal with probes of this = length", 0); 

if( (probeLength ^4 + 1) > inputVector.size ){ 
// reportProblem( "probe being trimmed to do annlysis", I); 

35 probeLength = (short)(inputVector.size / 4); 

} 

} * 

memset( inputVector. values, 0, inputVector.size * sizeof( float)); 

inputVector.vaiues[inputVector.size-ij =- 1, 
40 mx{ i - 0; i < probeLength: i i \ 

?r»p!>t Vector. vaiuesfi * 4 + iookuplndexf tojowei|]^iui>«^[ij })]- i, 

runForward( &inputVector, &;outputVector, &first Weights, &second Weights); 

*hyb = outputVector.vaiues[0]; 

♦xHyb = outDutVector.valuesf 1]; 
45 ieiuiiu TRUE, 
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Table 6. Code for running the neural net (lin_alg.c). 



iin_alg.c 

#include "utils.h" 
#include "Hn_alg.h" 
^include <alloc.h> 



short allocateMatrix( matrix local * theMat, short rows, short columns)! 
10 short i; 

theMat->values = calIoc( rows, sizeof ( float local * )); 

if( theMat->values = NULL ){ errorHwnd( "failed to allocate = matrix"); return 

FALSE;} 

for( i = 0; i < rows; 1++ ){ 
15 theMat->values[i] = calloc( columns, sizeof (float) ); 

if( theMat->vaiues[i] = NULL ){ 

crrorHwnd ("failed to allocate matrix"); 
for( "i; i >= 0; i- ) 

free( theMat->values[i] ); 
20 return FALSE; 

} 

} 

theMat->numRows = rows; theMat->numCols = columns; 
return TRUE; 

25 } short aIlocateVector( vector locaJ * theV ec, shon coliMr^^) [ 
theVec->values = calloc( columns, sizeof ( float)); 
if( theVec->values = NULL ) { errorHwnd( " faile to allocate = vector"); return 

FALSE;} 

theVec->size = colunms; 
30 return TRUE; 

} 

void freeVector( vector local * theVec ){ 
free( theVec->values ); 
theVec->values = NULL; 
35 theVec->size = 0; 

} 

wrM freeMatrix( matrix local * theMat) { 

V — :. 

40 for( i ^ 0; i < theM2t->nu"riHows: i-^-^- ) 

free( theMat->values[i] ); 
frre( fheMat-> values ); 

-^^Prtti.oc zrMTTT T / 

thcMat >numRows = theivIcii->uurnCcl3 ~ 0; 

45 } 
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float vDot( float local * inputl, float local * input2, short size ){ 
float retumValue = 0; 
short i; 

for( i = 0; i < size; i-H-) 
5 retumValue input * !!.nput2[ij, 

return retumValue; 

} 

short vectorTimesMatrix( vector local *input, vector local *output, 
10 matrix local *mat ){ 

short i; 

if( (input->size != mat->numCols) || (output->size < mat->nuniRov^s) ){ 
errorHwnd( "illegal multiply" ); 
return FALSE; 

15 } 

for( i = 0; i < mat->numRows; i-H- ) 

output->values[i] = vDot( input->values, mat->values[i], input->size = 



); 

20 } 



return TRUE; 



Example 7 

C^r.cric Difference Screening 

25 High density arrays comprising arbitrary (haphazard) probe oligonucleotides 

for generic difference screening v^ere produced by shuffling (randomizing) the masks used 
in light-directed polymer synthesis. The resulting arrays contained more than 34,000 pairs 
25 mer arbitrary probe oligonucleotides. The oligonucleotides in each pair differed by a 
single nucleotide at position 13. 

30 After hybridization, washing, staining, and scanning as described above, 

data files (containing information regarding probe identity and hybridization intensity) 
were created. 

Differences in intensity.' between the two oiiRonucieotides compiising each 
probe tjair K ( wiieie K ranges from i to 3\s32v) were Cr^^nst-tcd More spt^cifiCHiry, tlie 
intensity differences between the oligonucleotides of pair K for replicate j of sample i was 
calculated as: 
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where X is the hybridization intensity, i indicates which sample (in this case sample i or 
2). and j indicates which replicate (in this case replicate 1 or two for each sample), and K is 
Lhc probe pair (in this case I. . . 34,320), and 1 indicates one member of the probe pair, 
while 2 indicates die other member of the probe pair. 
5 Figures 1 6a and 1 6b and 1 6c illustrate the differences between replicate 1 

and 2 of sample 1 (Fig. 16a, the normal cell line) and between replicate i and replicate 2 of 
sample 2 (Fig. 16b, the tumor cell line) for each probe. Thus, Fig. 16a plots the value of 
(X„u-X„u)-(X,2krX,2k2) for k-1 to 34,320 on the vertical axis and K on the horizontal 
axis. The two replicates were normalized based on the average ratio of (Xuki-XiikjyCXiiki- 
lO X,3;j) for all probe pairs {i.e., after normalization, the average ratio should approximate 1 ). 
Similarly, Fig. 15b plots the value of (XjurXz.u^P^iikrX::^) after normalization between 
the two replicates based on the average ratio of (X2,k|-X2,k2)/(X22ki-X22k2). Figure 1 6c plots 
the differences between sample 1 and 2 averaged over the two replicates. This value is 
calculates as {<;X,,^,^y^^^)l2)-(J^„,,+^nM after normalization between the two 
15 samples based on the average ratio of [(X2,i.,+ X22u)/2]/[(X, iki+X,2k2)/2]. 

Figures 1 7a, 1 7b, and 1 7c show the data filtered. Figure 1 6a shows the 
relative chdiit-e i.. Lybrldiznticn :r.ten5!ties of replicate 1 and 2 of sample 1 for the 
difference of each oligonucleotide pair. After normalization between replicates (see 
above), the ratio is calculated as follows: If the absolute value of (X|,i,,-X,ik2)/(X,2ki-Xi2k2) 
20 > 1 , then the ratio=(X„v.-Xnuy(X,2krXnu) else the ratio= (X,2u-X,2u)/(X„k,-X„k2) Ohe 
inverse). The ratio of replicate 1 and 2 of sample 2 for the difference of each 
oligonucleotide pair, normalized, filtered, and plotted the same way as in Figure 17a is 
shown in Fig. 17b. The ratio is calculated as in Fig. 17a, but based on the absolute value of 
(X2urX:,u)/(X2:k.-Xa2k2) and (X22„-X:2u) /(X^u.-X^.u)- Fig- 17c shows the ratio of 
25 sample 1 and sample 2 averaged over two replicates for the difference of each 

oUiionuckotide pair. The ratio )S caicuiated as iii Fig. 17a, but b-ed on »he absolute value 

normalization as in Fig. 16c. 

Tne olifeu..uClcGt;dc pain: th-t s'^'^vv thf fn-eatest ditterential hybridizaUon 
30 between the two sampies can be lucatificd by ccrtir.g th? observed nybridization taiio a..d 
difference values. The oligonucleotides that show the largest change (increase oi' decrease) 



wo 97/2731 7 PCT/US9 7/01 603 



can be readily seen from the ratio plot of samples 1 and 2 (Fig. 17c). These differences do 
not appear to be in the background noise. Based on the identified oligonucleotide pair 
sequences, a gene or EST with the suspected sequence tag can be searched for in the 
sequence databases, such as GENBANK, to determine whether the gene has been cloned 
5 and characterized. If the search is negative, appropriate primers can be made to obtain the 
cDNA or part of the cDNA directly from mRNA, cDNA, or from a cDNA librai}'. 

From Figures 16a and 16b, it is observed that several oligonucleotide pairs 
show large differences between two replicates for the same sample. It is believed that this 
results from differential expression in a given tissue. These oligonucleotide pairs detect 

10 genes that are likely highly expressed, so the deviation of replicates for these pairs are 

larger than those oligonucleotide pairs that bind to nucleotides expressed at low levels (i.e., 
the standard deviation of the mean is proportional to the mean). That is also why the 
relative change between two samples is a better indicator to detect the differential 
expression between two samples (see Fig. 1 7c). In order to determine which 

15 oligonucleotide pairs are of greatest interest, the absolute and relative difference measures 
could be combined into a scoring fimction. 

Increasing the number of related oligonucleotide pairs (increased 
redundancy) and employment of two-color hybridization/detection schemes is expected to 
help reduce the background variation. This allows more sensitive detection of small 

20 differences and decreases the noise and occurrence of false positives. The 25 mer array 
used in this example is a small subset of all possible 25 mers, thus, increasing the total 
number of oligonucleotide pairs will greatly increase the ability to detect changes in genes 
of unknown sequences by allowing more complete coverage of the available sequence 
space. 

25 



NtiCteic Add Er,d LaheUn^ 
Several RNA transcripts as well as a full mRNA sample from mouse cells 
were fragmented by heat in the presence ui m^''*. A ilLo/i^ (dccxyribcnuclcic acid 6 m^r 
30 poly A.) labeled with either tinorescein or biotin at uie 5* eau woS tliCu ligatcd to the 
fragmented KNA using RNA ligase under standard conditions. 
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130 

The labeling appeared to be efficient and the hybridization pattern obtained 
using the labeled RNA as a probe was similar to one obtained using RKA that was labeled 

during an in vitro transcription step. 

Example 9 
Quantification of Labeling Efficiency 
Quantification of the labeling efficiency is accomplished by spiking 
experiments in which specific full-length unfiragmented RNA species are spiked into the 
total mRNA pool at different concentrations prior to the end-labeling procedure. The 
relative concentrations of the spiked RNA in the pool can then be measured by 
hybridization to a high density array of target oligonucleotides prepared as described 
above. This permits evaluation of the ability to detect particular RNA species at low 
concentration in the mRNA pool. 

Example 10 
PCR Labeling of Nucleic Acids 

Polymerase Chain Reaction (PCR) 

20 Ml PCR reactions substimted with 10% biotin-dUTP were 
conducted and the quantity of each PCR product was estimated with gel analysis. 
Approximately 250 fmoles of each PCR product was pooled. A Pharmacia S300 
sephacryl column (cat # 27-5130-01) was prepared with a 1 minute prespin at 3000 x 
g followed with a 200 fA wash and spin at 3000 x g for 1 more minute. The 
pooled PCR product was loaded and spun for 2 minutes at 3000 x g. 

The column was discarded and the eluate was speed vacuumed to 

DNase Fragmentation 

The dried down PCR pool in was rcsuspcnded Ln 13 H.Ofiom NEN 
D-aPcnt End J. -Minp Kn (cat n NELS24). 2.5 ^1 CcC!, and !^ ^ TdT buffer were 
added. Gibco BKL DNase 1 wa^ diluted to 0.25 LVf^l using lO mM Tris pH 8. 1 m1 of 
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diluted DNase was added to PGR product pool and incubated for 6 minutes at 37°C, 
denatured for 1 0 minutes at 99°C, and cooled to 4°C. The total volume was 29 //I. 

Terminal Tramferase (TdT) Labeling 

To the fragmented PGR pool, 2 ul of TdT enz>'me (from NEN kit 2 
stock) was added and 4 ^\ NEN kit biotin-ddATP was then added. The final 
volume was 35 and was incubated at 37 °G for 1 .5 hr. 



Hybridization 

The 35 ^1 labeled target was split into two 1 7.5 //I aliquots, one for a 
coding chip (GeneChip containing sense-strand sequences and pemiutations thereof) 
and one for the non-coding (antisense) chip. 1 82.5 of 2.5 M TMACl (Sigma 5 M 
stock diluted 1 :2 using 10 mM Tris pH 8) was added. Triton X-1 00 was added to a 
final concentration of 0.001%. In certain experiments, 4 /^l of 1 00 nM control 
oligonucleotide was added to the solution rather than at the stain step. 

The mixture was denatured at 95 °C for 5 minutes, added directly to the 
chin cartridge and hybridized with mixing at 37'*C for 60 minutes. 

Staining and Washing 

The hybridization solution was removed from the flow cell used in the 
GeneChip system (Affymetrix, Inc., Santa Glara, GA) and the chamber was manually 
rinsed with 3 X with 6X SSPE /0.001% Triton X-1 00 to remove TMAGl. 

A phycoerythrin stain solution was prepared as follows: 1 90 /il 6X 
SSPE/0.001% Triton X-100 +10/^1 of 20 mg/ml acetylated BSA + 0.4 /zl stock 
phycoerythrin (Molecular Probes Cat # S866) + 4^1 fluorescein control oiigo 100 nM 

StOwIk* 

The stairing so»un(>t* whs f^uued to the tlow cell with mixing at room 
temperature for 5 minutes. The staining solution was removed from iht". flow cell aiiu 
manuaJlv rinsed 3 X with washing buifer. 

ine ciiiD was washed on hybnuiziiuuii s^uitiori (the GeneChip cyctcm, 
Affymetrix, Inc.) using 6X SSPE/O.COI% Triton X-ICO at 35°C. 9 f.!!/drain chan-^es 
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of fresh wash solution were used and scanning took place in this buffer. Target 
sequences were accurately identified in this experiment. 

Example 11 

S End Labeling PCR Product 

PGR product was fragmented and end labeled using TdT from 
Boehringer Mannheim: After the PCR amplification, 5 ^il of a 50 )i\ PCR reacUon was 
run on a 1% agarose gel to estimate total yield of the amplification reaction. To 
fragment the DNA, the remaining 45 ^1 of solution was combined with DNAse I 

10 (diluted in H20 to a final concentration of 5 U DNAse i/ng DNA) and reacted for 1 5 
minutes at 31 °C. The DNAse was then heat killed for 10 minutes at 95 °C. The 
fragmented DNA solution was then held at 4°C until ready for the terminal transferase 
reaction. 

The terminal transferace reaction mixture consisted of the fragmented 
15 PCR sample, 20 ^L 5X terminal transferase reaction buffer, 6^1 25 mM CoClj 

(final concentration 1.5 mM). 1 nl of fluorescent dideoxynucleotide triphosphates 
(ddNTP final concentration 10 ^M) and 2 \iL of Boeiiringcr Mai:i;ic:n: tsnrjina'. 
transferase (TdT, final concentration 50 U/reaction), and H^O up to 100 ^l volume. 

The reaction was incubated for 30 minutes at 37 "C. THe whole 
20 reaction volume was then transferred to a 1 .7 ml tube, brought up to 500 ^l with 5X 
SSPE, 0.05% Triton hyb and scanned normally. 

Protocols for the 50 PCR reaction are found in the instructional 
materials accompanying the GeneChip™ HIV PRT Assay (Affymetrix, Sunnyvale, 
CA). 



75 



CAIP improves Base Calling 
In cprtain fragment end labeling experiments, the accuracy of base 
collmg in a GeneChip system was miproveu wlitn calf =lV«HnP nhosphatsae 

30 (CAIP) was used during fragmentation vvith DNAse. See Fis^re 1 8. 
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CIAP is useful! in degrading any nucleotides that were not incorporated 
in any previous amplification, transcripdon, and polymerase other polymerase 
reactions. Such degredation prevents the incorporation of those nucleotides in 
subsequent reactions, such as tailing and labeling reactions for example. 

5 

Example 13. 
Post-Hybridization End Labeling 

Post-hybridization end labeling experiments were performed. After 
hybridization of a target to a probe array in the GeneChip system, the targets were 
10 labeled using terminal transferase (shown as TdTase) as shovm in Figure 19. 

Post-hybridization labeling was shown to yield better results when the 
probe array (Chip) was pre-reacted as shown in Figures 20 and 21 . 

Figure 21 also shows the results of a DNAse titrations experiment. 

The various titration experiments are shown below in Table 7. 

15 Table 7. Hybridization TdTase end labeling call accuracy. Accuracy is based on 

Ratio = 1.2 of maximum to next highest calculated intensities. Calculated intensities 
= minimum of A, C, G, or T in tile set subtracted from adjusted intensity. Adjusted 





intensity = raw 


;„.A ,jct*/^t> , , 


T>nx> 






Experiment 


Pre-react 


Labeling 


Accuracy 


20 


HM207 
5 U DNAse 


ddTTP=1.8mM 
dTTP = 
TdTase = SOU 
Temp = room T 
Time = Ihr 


FITC-dUTP = 5 nmol 
dATP = 50 nmol 
TdTase = 50 U 
Temp - room T 
Time = 1 hr 


At least one strand = 
100.0% 

Both strands = 91.3% 
GeneSeq Composite = 
NA 




HM217 
5 U DNAse 


ddTTP=l,OmM 
dTTP = 3.0 mM 
TdTase = 12.5 U 
Temp = room T 
Time - overnight 


FITC-dUTP = 0.5 nmol 
dATP = 5 nmol 
TdTase = 5 U 
Temp = room T 

Time = 1 5 min 


At least one strand = 
99.8% 

Both strands = 89.6% 
GeneSeq Composite = 
99.2% 




HM770 


ddTTP= l.SmM 

TdTase = 12.5 U 
Temp = 37X 
Time = overnight 


Kf'S f '-r*' ! 1!-* ~~ ^! ^ r»wioi 

ill WClW i 1 v/.^ 

dATP = 5 nmoi 
TdTase = 5 U 
Tfcaip-37°C 
Time = 1 5 mm 


At least one str^irid — j 

Both strands = 91.1% 
GeneSeq Composite = 
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These results show that base calling accuracy can impacted by the length of the target 
fragments. Such results further demonstrate the utility of the methods disclsoed 

herein. 

Other experiments have shovra that lU of DNAse is panicularly useful in 
5 obtaining ideal fragment lengths. 

Example 14 
End-Labeling (Tailing) with Poly T 

The nucleic acids tailed with poly-A or poly-A analogs (labeled or unlabeled) 
10 using methods similar to those set forth in Example 1 3 can be labeled using labeled 
poly-T, as shown in Figure 22. 

Example 15, 
Synthesis of Fluorescent Triphosphate Labels 

j5 To 0.5 nmoles (50 ^L of a 10 mM solution) of the amino-derivatized 

nucleotide triphosphate, 3'amino-3"deoxythymidinetriphosphate (1) or 2'-amino-2'- 
deoxyuridine triphosphate (2), in a 0.5 ml ependcrf tube was added 25 ^L of 1 1 M 
aqueous solution of sodium borate. pH 7, 87nL of methanol, and 88 nL (10 ^mol, 20 
wquiv) of a 1 00 mM solution of 5-carboxyfluorescein-X-NHS ester in methanol. The 

20 mixture was vortexed briefly and allowed to stand at room temperature in the dark for 
15 hours. The sample was then purified by ion-exchange HPLC to afford the 
fluoresceinated derivatives Formula 3 or Formula 4, below, in about 78-84% yield. 
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Experiments suggest that these molecules are not substrates for 
terminal transferase (TdT). It is believed, however, that these molecules would be 
5 sutstrates for a polymerase, such as klenow fragment. 



Example 10 
Synthesis of as-Triazine'3y5[2Hy4H]-diones 

The analogs a<;-tri37ine-3,S[2H,4H]-dione ("6 aza pvrimidine") 
10 irMideottries (v^^ r m 71a) ate syxittiesi?^ by methods similar to t^i03c used Sjv i^ctnc, 
ct al., Bioconj, Chem. 2: 441 (1991). 

Other useful labeling reagents are sythesizeu including 5-bromo- 
U/dU 1 0 or ddUi t^. See for example Lope2-Canovai>, L. Et aJ., Arch, m^^d, ne^ 25. 
i 89-192 (1994); Li, X., el al., Cytometry 20: 172-1 80 (1 995); Soultvv'ood, J. Et al., J, 
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Pathol. 148: 61 ff. (1986); Traincard, et d\.,Ann. Immmol.mQ: 399-405 (1983); and 
Figures 23a. and 23b set forth herein. 

Details of the synthesis of nucleoside analogs corresponding to all of 
the above structures (in particular those of Fig. 23b) have been described in the 

5 literatu-e Known procedcures can be applied in order to attach a linker to the base. 
The linker modified nucleosides can then be converted to a triphosphate amine for 
final attachment of the dye or hapten which can be carried out using commercially 
available activated derivatives. 

Other suitable labels include non-ribose or non-2'-deoxyribose- 

10 containing structures some of which are illustrated in Fig. 23c and sugar-modified 
nucleotide analogues such as are illustrated in Fig. 23d. 

Using the guidance provided herein, the methods for the synthesis of 
reagents and methods (enzymatic or otherwise) of label incorporation useful in 
practicing the invention will be apparent to those skilled in the art. See, for example, 

15 Chemistry of Nucleosides and Nucleotides 3, Townsend, L.B. ed., Plenum Press, New 
York, at chpt. 4, Gordon, S. The Synthesis and Chemistry of Imidazole and 
Benzamimzoie Nucleosides a..d Nucleotides (1994); Gen Chem. CfagmislaQf 
Nucleosides and Nucleotides 1, Townsend, L.B. ed.. Plenum Press, New York (1994); 
can be made by methods similar to those set forth in Cbgmigtrv of MudSQSidss and 

20 Nucleotides 2, Tovmsend, L.B. ed.. Plenum Press, New York, at chpt. 4, Gordon, S. 
"The Synthesis and Chemistry of Imidazole and Benzamidizole Nucleosides and 
Nucleotides (1994); Lopez-Canovas, L. Et al., Arch. Med Res 25: 1 89-192 (1994); Li, 
X., et al., Cytometry 20: 172-1 80 (1995); Boultwood, J. Et al., J. Pathol. 148: 61 ff 
(1986); Traincard, et al., Ann. Immunol.mO: 399-405 (1983). 
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Example 11 

Biotin-chem Link (Boehringer-Munnheim} 

Tne labeling donsit)- is suppose to be 1 biotin per 10 bases. 
Coordinative. no,i-wuvalcnt binding of Biot--rhpn>-Link to N7 ol adenosme and 
guanosine involves heating 1 ug RNA or UNA + 1 ui BCL In 20 ul vol. 85°C for 30 
minutes 
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RNA labeling experiment (4 sets of 4 pooled RNA transcripts) 

Very poor labeling and/or hybridization (cant see 5 pM at all, 20 pM is 
ver> weakV Samples may have been lost after labeling when microcon-lOOs was used 
to remove unincorporated label. RNA uas fi^gmented after labeling. It is believed 
that this should not be a probem (BM tech help). 

BCL labeling ofdsDNA 

Low signal, background across the entire chip. No discrimination. 

Fast'Tag (Vector Labs) (RNA) 

Should get 1 biotin per 10-20 bases. Five reactions were run: 

a) RNA1+RNA2+RNA3 (5 pmoles each, total of 5.2 ug) + 25 ul Fast Tag reagent 

b) RNA1+RNA2+RNA3 (9 pmoles each, total of 9.4 ug) + 25 ul Fast Tag reagent 

c) R>J A 1 +RNA2+RNA3 (1 8 pmoles each, total of 1 9 ug) + 40 ul Fast Tag reagent 

d) RNA4-fRNA5+RNA6 (10 pmoles each, total of 8.7 ug) + 25 ul Fast Tag reagent 

e) RN A7+RNA8+RNA9 (10 pmoles each, total of 1 1 .4 ug) + 25 ul Fast Tag reagent 
The heal method was used to link S-S to RNA. The result: 20 x lower hybridization 
signal than same targets labeled by IVT method. 

Example 12 

RNA Iigase/bi0'a6 end labeling 

This experiment generally involved the following steps: a). RNA was 
ft-agmented; b) RNA fragments were 5' phosphorylated vAth polynucleotide 
kinase/ATP; and c) The 5' end of the RNA is ligated to the 3' end of BioA6 using 
RNA ligase. This is illustrated by the following formula: 

5'biotin-AAA.AA.A-OH3'+ S^F-RNA-OHi' = S hioAAAAAA-kNA V 

?revioi:i>ly tlhs iechiiiCjue \Kas iiieu to la^r: torr: rr:i;:;r:7 tt^T^Na wh:rH 
was hybridized to unpackaged chips (high density oligonucleotide arrays) (on 2x3 
iiiides) in a 10 ul volunie. Lack of uil^Ju^ wa:> a Digiiificant problem and rc:;ultcd in 
low hybridization intensiues. in vitro trdiisctipiiun (IVT) lauclcu RJn A undci tliesc 
conditions gave 1 0 X higher signal than bio-A6/RN A Ligase labeled target. 
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In other experriments, 3 different ratios of bio-A6:RNA were used: 

1) Ix bioA6 =0.5 nmoles biotin-A6 per 1 ug RNA); 

2) 2x bioA6; and 

3) 4xBio A6. 

5 After labeling, the sample was spun teough a microcon-EZ and microcon-3 to 
remove enzymes and dilute out buffer components. 

Bio-A6 labeled target hybridized to chips (high density oligonucleotide 
arrays) gave approximately the same hyb. intensity as in vitro transcription (IVT) 
labeled target. 

10 Staining was for 1 5 minutes with PE at normal cone. No significantly 

higher signal or background was seen with 4x as much bioA6 per ug RNA. 

For these expereiments, BioA6: (5' biotin-AAAAAA RNA ) was 
ordered from Genset. 

15 Example 13 

Preparation of GeneSpecific Transcripts 

Template DNA preparation 

Linearization of vector: 

If the gene is not already cloned in a vector v^th T3 and T7 RNA 
20 polymerase promoter sites flanking the insert, see PCR amplification below. 

The vector is linearized with an enzyme that cuts at the 3' end of the 
insert for sense transcripts, or at the 5' end for antisense transcripts. The insert 
sequence was checked to verify that the RE does not cut intemally. In a preferred 
embodiment, aa restriction enzyme was chosen that does not produce 3' protruding 
25 ends. 

Tr^r./^^a.^io !;r.{;^r;7^t;cr4. nn ciicuot of thf «>*mpie Is run on a gci (next 
to uncut vector) to verify complete digestion. 

The C2rr„pie f^ptionj^ily treated with Protemase K (100-200 ug/'ial) at 
50 C/20 uihi - 1 liour to remove enzj'me or r?^^d"^l RNases (used in qiasmiu iiiuuFiCp 
30 protocols). 
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The linearized DNA is purified DNA by phenol/chloroform extraction 
and ethanol precipitation or 3-4 rounds of microcon-100 concentration/redilution (see 
below). 

5 PCR amplification 

Amilification is only preferred 11' the desired region of the gene is not 
already in a cloning vector with RNA polymerase promoters. 

Starting with genomic DNA (or cDNA), amplify the ORF of interest 
(or region of the gene represented on the chip) using PCR primers with 5* T3/T7 RNA 
10 polymerase promoter sequences and 3' gene-specific sequences. 

The following 5' sequence has worked well (with 19-21 gene-specific 
bases added to the 3' end). 



5*-GAATTGTAATACGACTCACTATAGGGAGG-[+ 19-21 gene-specific bases]-3' 



The 5* end consists of: 

^) si:^ 5' fl^rJiin^ b^r^r cf ycvr c^'^fcc - riot r^rt of the orornoter 

sequence, but necessary for maximum IVT efficiency, 
b) 17 bases of the core T7 RNA polymerase promoter sequence 
20 c) 1st 6 bases transcribed (sequence of +1 to +6 can affect 

efficiency) 

The other PCR primer would then contain the T3 RNA polymerase promoter 
sequence at the 5' end. The following sequence has worked well: 

25 5'-AGATGCAATTAACCCTCACTAAAGGGAGA-(+19-21 gene-specific bases)-3^ 

Tne 5' end consists of; 

a) six 5' flanking bases (sequence can vary from this example) 

b) 1 7 bases ot core 1 i RNA h*otymerase promoter sequence 
30 c) E 1 to +6 transcribed bases 
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Amplify the desired sequence using standard PCR conditions with ist 
5 cycles at the annealing temp, best suited for the gene specific part of the primers 

aionc (typically SS-SS^C), followed by 25 cycles with asinealing at 70°C. Check PCR 
products on an agarose gel (3-5 ui of a 100 pi rxn). It is not necessary to quantify at 
5 this stage. 

Optional Proteinase K treatment: 

Add 1 ul of Proteinase K (20 mg/ml) (Ambion) to the remainder of Ae 
PCR reaction and incubate 20 min to 1 h at 50-60°C. This is usually not necessary, 
10 but if the in vitro transcription (IVT) products appear degraded while the control IVT 
product included in the kit (described later) is full length, then this step may be added 
prior to the microcon-100 and IVT. 

Microcon 50/100 purification 
15 Other purification methods are being tested. Ethanol precipitation can 

be subsituted for micron-50 purification. CAUTION: Microcons may leak. Save all 

flow-through portions. 

Add 380 \s\ RNase-free water to the PCR product and concentrate 
using a microcon-100 or microcon-50 as suggested in instructions (Amicon). Repeat 
20 the dilution and concentration 2-3 times. The final concentrated sample should be 5- 
100 ^1. 

In vitro transcription labeling with biotin 

For maximum yield use Ambion's T3 (#1338) or T7 (#1334) 
25 Megascript sy.stem (their proprietaiy buffer allows higher nucleotide concentrations 
v.-ithout inbib'iioji the polymerase). (Read Ainb,'i>n iriStriictiony a::d tuggest-o""; kit 
book!). 

Perform IVT as suggested, but v.ith (i :3) hJotinylatedtunlabelsd CTP 

J I if,> \ » . Ti ^ '^i: -"•'•l'>'»t^'^'=^ that come with the 
and n J F. Do not intercnange i j ohu i » • v/a 

30 Megascript kits 

For example, make a NTP mix for 4 IVT-Iabeling reactions as follows: 
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8 ^1 Ambion*s T7 lOx ATP [75 mM] 
8 ^1 Ambion's T7 lOx GTP [75 mM] 
6 ill Ambion's T7 lOx CTP [75 mM] 
6 ^1 Ambion's T7 lOx UTP [75 mM] 
5 15 III Bio^l KCTP [10 mM] (ENZO #42818) 

15 ^1 3io.I6-UTP [10 mM] (ENZO ^42814) 
For each IVT-labeling reaction, add (at room temp. - not on ice): 
14.5 ul NTP mix 

2.0 ul lOx T7 transcription buffer (Ambion) 
10 * 1 .5 ul purified PGR product (not more than 1 |ig) 

2.0 ul 1 Ox T7 enzyme mix (Ambion) 
♦Do NOT add more than 1 ug of DNA to the IVT reaction. Higher concentrations of 
DNA actually inhibit the reaction and result in LOWER yields. Final rNTP 
composition: 
15 7.5 mM ATP 

7.5 mM GTP 

5.525 Tn^A ccid UTP/I.875 zrAi b'o-UTP 
5.625 mM cold CTP/1.875 mM bio-CTP 
Incubate 4-6 hours at 37°C. Shorter incubation times may be sufficient for some 
20 transcripts or when maximum yield is not important. 

Optional: DNase 1 treatment 

Add 1 ^1 RNase-free DNasel (provided with Ambion kit) to each 
reaction and mix well. Incubate 15-20 min. at 37°C. 

25 

Optwna! - Praidstuse K trecEimerd 

Tnis step rnay heip i educe DaoKgiOund caused by ncnsp'wcinc prottiin 
binding tc chip and tc Strepavidin-phycoerj^thrin: 

Add RNase-tree water to 1 V 1 reactions to a finai voiume of 55 ui. 
30 Add 1 ul of Ambion's 20 m^'ml proteinase K. 

Incubate at 50 "^C 20-30 min. 
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Microcon purification 

Several other purification methods have been tested - many did not 
sufficiently remove rNTPs or had low,- yields. A protocol for Carboxy bead-based 
purification (Archana Nair) looks very promising and wWl soon be used in place of 

microcon purification. 

Note: Set aside an aliquot of the IVT reaction before further 
purification. Setting aside 1% will enable trouble shooting of this step if necessary. 

1 . Add 400 u! DEPC water to sample and concentrate sample with 
microcon 50 or 100 (as suggested by Amicon). SAVE ALL 
FLOW-THROUGH FRACTIONS.. 

2. Repeat dilution/concentration 3-4 times. Final volume can be 
10-100 ^l. 

See comments below. 



Check IVT produces) on a gel 

Usually it is sufnciem chw..k -0.01 -1^: cf the re-ction on a 
nondenaturing agaroseyTBE gel. Samples are heated to 65 °C for 1 5 minutes prior to 
electrophoresis. A single band close to the expected size is usually observed. 
20 If there is enough space on the gel, run 2 or 3 different dilutions of both the unpurified 
and purified IVT products on a gel (~ 0.01%, 0.1% and 1% of each). Gels can be 
stained with Sybr Green II (FMC) at a 1 : 10,000 dilution in Ix TBE buffer (more 
sensitive than ethidium bromide). 

If precise determination of transcript size it desired, a denaturing gel 
25 can be run with biotinylated RNA standards (available firom Ambion). 



Quantify imnscripi yizld byA^a 

Fxpect 75-1 50 ug RNA per 1 ug starting DNA template. For 
quantitation nfnurifjed transcript, about 1% ot the concenuateu aomplc Jlliitcd v;-.th 
30 water (or I t) into a fmai volume of 60-70 ul (for a microcuvenp . should give 

absorbance readings within the accurate range (0.1-1 OD). For accurate pipetting 
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volumes (> 1 it is usually necessary to make a serial dilution first (for example, 
make a 1/1 0 dilution of your RNA sample, then measure 10% of the dilution in 60-70 
ul final vol.). Always be sure to take a blank reading in the same cuvette and using 
the same buffer/water that the RNA sample is diluted into. 
5 Since accurate quantitation of pure transcript is essential for 

meaningful spiking experiments, extra care should be taken to verify that excess 
nucleotides from the IVT reaction have been sufficiently removed and are not 
contributing to the A26o* 

The microcon flow through should be saved and checked for Ajea- If significant 
10 absorbance is present in the last flow through, the RNA should be subjected to 

additional rounds of dilution and concentration until no significant absorbance is 

detected at 260 nm. 

Since microcon filtration devices occassionaly leak, it is advisable to 

save all flow- through fractions. If the transcript RNA concentration in the 
15 retained/collected sample is much lower than predicted, the flow-through fractions 

can be re-concentrated using a fresh cartridge (then diluted and reconcentrated at least 

4 tirnes^- 

Example 14 
Labeling Total mRNA from Cells/Tissues 

20 Starting material: Good quality poly A* RNA from at least 5 x 1 0^-1 x 

10^ cells *(0.1ug-5ug poly A+). It is more economical to start with more poly A-f 
RNA (up to 5 ^ig), but if material is limited, as little as 0.1 ^g of poly(A)+ can yield a 
sufficient quantity of labeled RNA target (10 ^ug after IVT labeling/amplification). 

25 Double Stranded cDNA Synthesis: 

This protoco? is a si-pplcii-ciit to instn-rr:—" provided Gihno RRT O*; 
Sunei-scnpt Choice Systern. Before proceertrng read the Gibco protocol. Foiiow Gibco 
BRL's Superscript Choice System for cDNA Synthesis, except use the T7-(T)24 
sequence fbelow) for priming the reverse transcription-first strand cDN A synthesis 

30 instead of tlie oligo(dT) or laiidom priiueib piovided with the kit. 
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T7-(T)„ primer: 5'-GGCCAGTGAATTGTAATACGACTCACTATAGGGAG 
GCGG-(T)2,-3' 



First Strand Synthesis 

Use 0.1 jig-5 Poly (AfRKA and adjust amount of H,0 and enzyme 

as indicated in the BRL instructions. For example: 

3 nl DEPC-water 
4.5 ^1(1 ng/^l)mRNA 
1 jil (100 pmol/ul) T7-(T)24 pnmer 

Mix/Spin/lncubate at 70 °C for 10 minutes. 
Chill on ice. 

Add the following components (on ice) to the RNA/primer 

4 ^1 of 5X 1st strand cDNA buffer 
2nl0.1 MDTT 
1 „\ nOtriMldNTPmix 
Incubate at iTC for 2 minutes. 

Add 4.5 ul Superscript II reverse transcriptase/mix well. Use ( 
ul SSII RT per ug RNA). For <1 ug RNA, use 1 ul RT. 
6. Incubate for 1 hour at 37°C. 
Final Reaction Composition (20 \i\ vol.): 
50 mM Tris-HCl, pH 8.3 
75 mM K.C1 
25 3 mM MgClj 

iu miVi Ul I 

5r!0 uM t^i. dCl?. dGTP, dTT? 

1 00 pmolT7-(T)j4 primer 
4 5 itp mRNA 

3Q 900 U FT (?0() U per ua mRlNA) 



10 



mix: 



15 



1. 
2. 
3. 



20 



4. 
5. 



Second Strand Synthesis 
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1 . Place first strand reactions on ice (quickly spin down). 

2. Add: 

95 Hi DEPC-H,0 
30 \i\ 5x Second Strand Buffer 
5 3 ^l[IOmM]dNTPmix 

1 ^! [10 U/'m!] E.coli DNA Ligase 

4 jil [10 U/m1] E. coli DNA Polymerase I 
1 Ml [2 V/iil] RNaseH 

Final Composition (150 nl): 
10 25 mM Tris-HCl, pH 7.5 

lOOmMKCl 

5 mM MgCl^ 
10mM(NH,)2SO4 
0.15 mM b-NAD+ 

15 250 ^M each: dATP, dCTP, dGTP, dTTP 

1.2 mM DTT 
65 U/ml DNA ligase 
250 U/ml DNA Polymerase I 
13 U/ml RNaseH 

20 3. Mix/spin down/ incubate at 1 6°C for 2 hours. 

4. Add 2 Ml [10 U] T4 DNA Polymerase. 

5. Incubate 5 min. at 16°C. 

6. Add 10 Ml 0.5 M EDTA/store at -20 °C. 

25 CLEAN UP 

Phenol/chioroform extraction 

Opt!cna!:To reduce sample loss during e.vjatf'oji, see V'W, PLC? 

protocol beiow 

1 . Add an equal volume ( 1 62 u!) of (25:24: 1 ) 
30 PhcncI:chlGrcfcrm:isoan:y! alcohol ""th 10 mM 

Tris-HC! pH 8.0/lmM EDTA - S'gma) 
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2. Vortex/spin 5 minutes (g 1 4000 x g. Transfer aqueous phase to 
a fresh 1 .5 ml tube. 



PLG'Phenol/Chloroform Extraction 
5 Phase Lock Gels (PLC)* form an inert sealed banier beiu een the 

aqueous and organic phases of phenol-chiorofoim extractions. The solid bamer 
allows more complete recovery of the sample (aqueous phase) and minimizes 
interface contamination of the sample. PLGOs are sold as premeasured aliquots in 1 .5 
ml tubes, to which the user directly adds sample and phenol-chloroform. 
jQ 1 . Pellet the Phase Lock Gel (1 .5 ml tube with PLG I -light.) in a 

microcentrifuge for 20-30 seconds [PLG 1-heavy should also 
work, but we haven't specifically tested it for this application]. 

2. Transfer the entire (1 62 nl) cDN A sample to the PLG tube. 

3. Add an equal volume (162 nl) of (25:24:1) Phenol: 
chlofroform: isoamyl alchohol (samrated with lOmM Tris-HCL 
ph 8.0/1 mMEDTA-Sigma). 

v;v . T^O NOT VORTEXV PLG will not become 

part of the suspension. Microcentrifuge at full speed (12,000 
xg or greater) for 2 min. 
20 5. Transfer the aqueous upper phase to a fresh 1 .5 ml tube. 

PLG I IS available from 5 Prime-3 Prime, Inc., cat. #pl-175850 for 50 or #pl- 
188233 for 200 



15 



Microcon-SO Purification 
25 Other purification methods are being tested. Ethanol precipitation can 

v,.- c-^hsinifsd for micron-50 purification t ;AUTION. Microccns may leak. Save 

tiow-through poitioiis. 

1 . Add 300 ul of 5 mM Tris pH 7.5 to sample. 

2. ConcenUttie u, .pir„-.;r.£ " Mirmcon-SO column 
3Q (M!Gmcon-50 colmm.^, Air.iccn part #4741 6) foUowmg 

directions supplied by Amicon. 
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3. Repeat dilution/concentration 3-4 times, collect and set aside 
flow through in case of column failure. 

Concentrate to a fmal volume of 5-10 ul if possible, taking care not to allow the 
cartridge to spin to dr>7iess. Collect upper volume. 

5 

In Vitro Transcription Labsling with Biotin 

For maximum yield use Ambion*s T3 (#1338) or T7 (#1334) 
Megascript System (their proprietary buffer allows higher nucleotide concentrations 
without inhibiting the polymerase). 
10 Perform IVT as suggested, but with (1 :3) biotinylated:unlabeled CTP 

and UTP. Do not interchange T3 and T7 lOX nucleotides that come with the 
Megascript System. Read the Ambion detailed instructions and suggestions before 
proceeding. 



15 NTP Labeling Mix 

To make NTP labeling mix for 4 IVT-labeling reactions combine: 
8 ^1 Ambion's T7 lOx ATP [75 mM] 
8 \i\ Ambion^s T7 lOx GTP [75 mM] 
6 \i\ Ambion's T7 lOx CTP [75 mM] 

20 6 ul Ambion's T7 1 Ox UTP [75 mM] 

15 ul Bio-1 1-CTP [10 mM] (ENZO #42818) 
15 ^l Bio-16-UTP [10 mM] (ENZO #42814) 



IVT Reaction 

25 1 . For each reaction, combine the following at room temperature, 



14.5 tii NTP iabeiing mix 
2.0 jil lOx T7 transcription buffer (Ambion) 
*1.5 ul ds cDNA (0,1-1 up is optimal: see note below!) 
30 2.0 \i\ lOx 1 / enzyme mix ( Ambion j 
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*Do NOT add more than 1 ^ig of ds cDNA to the IVT reaction. Higher concentrations 
of DNA actually inhibit the reaction and result in LOWER yields. 

Fir^! rNTP Composition: 

7.5 mM ATP 
7.5 mM GTP 

5.625 mM cold UTP/1.875 mM bio-UTP 
5.625 mM cold CTP/1.875 mM bio-CTP 

2. Incubate 4-6 hours at 37oC. (Shorter incubation times may be 
sufficient for some transcripts or when maximum yield is not 
important). 

3. Store unused NTP labeling mix at -20 ^'C. 

CLEANUP 

Optional DNAse 1 Treatment 

1 . Add 1 ul RNase-free DNasel (provided with Ambion kit) to 
each reaction and mix well. 

Optional Proteinase K Treatment 

This treatment may help reduce background caused by nonspecific 
protein binding to chip and to Strepavidin-phycoerythrin. 

1 . Add RNase-free water to IVT reactions to a final volume of 99 
^1. 

2. Add 1 ul of Ambion's 20 mg/ml Proteinase K. 
i 3. Incubate at 50°C 20-30 minutes. 

Microcan Purificatii^n 

Several other purification methods have been tested - many did not 

sufficiently remove rN irs or nau low yiciuo. ^ * ^^y ,r un^cu 

0 purification (.\rchana Najr) iooks very proruiaiiig and will seen be used in p)".''^ of 
microcon purification. Set aside an aliquot of the IVT reaction before further 
purificaUon. Setting aside 1% will enable trouble shooting of this step if necessary. 
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1 . Add 400 ul DEPC water to sample and concentrate sample with 
microcon 50 or 100 (as suggested by Amicon). SAVE ALL 
FLOW-THROUGH FRACTIONS.. 

2. Repeat dilution^concentration 3-4 times. Final volume can be 
10-100 ul. 

3. Since microcon filtration devices occasionally leak, it \s 
advisable to save all flow-through fractions. If transcript RNA 
concentration in the retained/collected sample is much lower 
than predicted, the flow-through fractions can be re- 
concentrated using a fresh colunm then diluted and 
reconcentrated at least 4 times. 

1 . Starting with 4-5 ug poly (A)"*" for the ds cDN A synthesis and 
using 20% of the purified ds cDNA sample for the IVT, expect 
--75 - 125 ug labeled RNA per IVT reaction. 

2. Reading 1% of the concentrated sample diluted with water (or 
TE) into a final volume of 60-70 ul (for a microcuvette) should 
give absorbance data within the accurate range (0. 1-1 OD). For 
accurate pipetting volumes (> 1 ul), it is usually necessary to 
make a serial dilution first. For example, make a 1/10 dilution 
of your RNA sample, then measure 10% of the dilution in 60- 
70 ul final volume. Be sure to take blank readings in the same 
cuvette and use the same buffer/water that was used for diluting 
the RNA sample. 

3. For accurate quantitation of labeled RNA, extra care should be 

talccn to verify' that excess nucleotides from die IVT reaction 
have been suttieiently removed and are not ccntribiitins to the 

^260- 

The microcon flow-throuph should be .saved and checked for A^^^. If significant 
absorbance is present tHf l?*<;t flow tbronph the RNA should be suhiected to 
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additional rounds of dilution and concentration until no significant absorbance is 
detected at 260 nm. 



Check unfragmented samples on geL 

5 Rlectrophorese the labeled RNA before fragmentation to observe the 

si2£ distribution of labeled transcripts. Samples can be heated to 65 °C for 1 5 minutes 
and electrophoresed on agarose/TBE gels to get an approximate idea of the transcript 
size range. If there is enough space on the gel, run 2 or 3 different dilutions of both the 
unpurified and purified IVT products on a gel (- 0.01%, 0.1% and 1% of each). Gels 

10 can be stained with Sybr Green II (FMC) at a 1 : 1 0,000 dilution in 1 x TBE buffer 
(more sensitive than ethidium bromide). 

Alternatively, for more accurate estimations of the size distribution of 
the RNA population pre and post fragmentation, electrophorese samples through a 
denaturing gel using biotinylated RNA molecular weight markers (Ambion). 

15 

Example 15 

Direct iaoeiing of BNA with Fsomieti-ElGtm 
The psoralen-biotin reagent comes lyophilized and can be bought 
separately or as part of "Rad-Free Universal Oligo Labeling and Hybridization Kit" 
20 (Schleicher & Schuell). It is actually cheaper (per nmole) when bought with the kit so 
you might as well get the extra kit components and save money. The Rad-Free 
Universal Oligo Labeling and Hybridization kit: catalog # 483122 (contains 20 
nmoles of Psoralen-biotin). The same kit with UV Long wave 365 nm lamp: 
#483124.. 

25 1 . Spin down then resuspend the lyophilized psoralen-biotin 



a) 14 u! of DM^- you may iabei fragniented DinA/'Ka^A 
or oJigonucleotides with seme of the reagent (it needs to 
be more concentrated) OR 

b) d6 ui of Dmr u you wm uwi.iAofcw*jr .^^w^.t, * 

fragmentation. 
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Labeling has been performed both before and after fragmenting 
with similar results, but it is easier to do before fragmentation 

because it can't be labeled in high salt (>20 mM). 

2. Adjust the RNA/DNA concentration to 0.5 ug/10 ul (200 u! 
5 for 10 ug of DNA), less tiian 20 mM salt. pH does not matter 

(pH 2.5«10) so ycu can just use steriie or DEPCed water to 
resuspend or dilute the RNA/DNA into. Plasmid DNA needs 
to be linearized. 

If RNA/DNA is in high salt, it can be diluted and concentrated 
10 using the appropriate size of microcon (even microcon 3 works 

for fragmented material but takes -70 min per cycle). 

3. Boil sample 10 min./quick chill on ice (store on ice 5 min-3hrs) 
[important - ds DNA will become cross-linked by reagent if 
strands are not separated before labeling] 

15 4. In dim light add 1 ul of psoralen-biotin reagent per 20 ul of 

DNA/RNA solution (lul psoralen-biotin that was resuspended 
in 56 ul DMF per ug DNA/RNA). *if Psoralen-biotin was 
resuspended in 14 ul, dilute the amount you will need for 
labeling 1:3 in DMF (1 ul cone, psoralen-biotin + 3 ul DMF) 

20 5. Transfer solution to into a well of a 96-microwell plate on ice 

(up to 150ul/well). 
6. Place 365 nm UV lamp directly on top of plate so that light 
source is about 2 cm from the sample. Irradiate samples for 
one hour. 

25 7. Transfer samples to microcentrifuge tubes and add 2 volumes 

of IljO-sarcuratcd n butane! tc extract unincorporated psoralen 

8. Discard butanol (lop layer). Repeat extraction. 

9 Fr?^prnent as yon would normally. Denature as normal before 

30 hyhridi7^tinn (10 min Q9.100''(\). 

* longer KJV irradiation does not improve results. 
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* adding more psoralen-biotin per ug DNA/RNA does not seem to improve results. 



Results of hybridization to chip (5 pM each ). PB labeled targets showed 
approximately ^5x lower intensities than IVT(bio-U+C) labeled targets 

Labeling before vs. after fragmentation: 

No significant difference in hybridization intensities 

Ratio of psoralen-biotin to RNA 

Labeling with a 4x higher ratio of PB : RNA does not significantly 
affect hybridization intensities on chips. 

Time of labeling reaction/uv lamp intensity 

No significant difference between 1 vs. 3 hr. labeling or 15-20 
mW/cm2 (Affy lamp) vs. 5-7 mW/cm2 (S&S lamp) intensity at 365 nm, 

Psoralen-biotin 

Psoralens: planar, tricyclic compounds 

Psoralen-biotin: psoralen conjugated to biotin via 14-atom linker arm. 
High affmity for nucleic acids 
Intercalates into DNA/RNA 



Example 16 

Psoralen-Biotin Labeling Experiments 

Labeling RNA by standard protocol 

Pool of 4 diff. fragmented RNA transcripts labeled with psoralen- 



biotin 




Exfifytple 1 7 
Terminal Transferase End-Labeling Protocol 
This protocol is tested and optimized thoroughly with only PRT 440S 



chips.) 
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DNAse fragmentation *^ 
This will have enough for 4 labeling rxeactions: 

4 pmol of HIV PGR target (3. 1 7ug ofl .2 kb insert) Xul 
DNAse (BRL) Xul ( 1 U/ug) 

5 Calf Alkaline Phosphatiise, lU/ul (BRL) ?„5ul (?..5U/rx) 

Dilution CAP Buffer (BRL) 2.5ul 
MgCij Xui (1 .25mM) 

Bring up with H20 to lOOul 
37°Cfor 15min. 
10 95°Cfor lOmin. 

4°C on hold. 

TdT Labeling 

F-N6-ddATP, F-ddATP, F-ddCTP, and F-ddUTP are comparable 
15 labeled in the reaction. We decided to use F-N6-ddATP. 

Fragment DN A sample 25ul ( 1 pmol) 
5X TdT Buffer (Boehringer) 20ul (IX) 

F-N6-ddATP(lmM) lul (lOuM) 

20 TdT (25U/U1) (Boehringer) lul (25U/rx) 

HjO 43ul 
37°C for 30min. 
95 X for 5mm. 
4*Conhold. 

25 

PRT440S Hybridization (Rela Station) 

Labeled sample i OOul 

Control (lOOnM) 213 Oligos 5ul 

■JA It r\ IOC. I 

AC°r^ TT,.u -yn ;_ 

20^C Wash with 6X SSPE, 0.005% Triton X-100; 4 cycles / 10 drain-fill. 
Scan chip at 530nm, 1 1.25um pixel size. 
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Example 18 
Alternate Labeling Procedures 

Ligation assay 

RNA can be directly labeled by ligating an A6 RNA oligonucleotide 
with biotin at the 5' end with RNA hgase. Cre, a bacterial gene, was transcribed with 
T7 RNA polymerase to generate an antiscnsc RNA. The RNA was fragmented attd 
kinased with olynucleootide kinase to generate 5' phosphorylated ends. The Biotin A6 
RNA was then ligated using T4 RNA ligase. 5pm of ligated RNA was tested on gene 
expression chips along with the labeled Cre. 



Direct labeling of 3' RNA using Poly A polymerase 

Poly A polymerse has been used to catalyze poly A tail on to the free 3' 
hydroxy 1 terminus of RNA utilizing ATP as a precursor. Recently, it was reported by 
Joomyeong Kim et al. (1995) Nucl. Acids Res., 23(12): 2245-225 1 , that they 

15 successfully used poly A polymerase to tail 3' RNA with CTP. This method can be 
used to label fragmented RNA with biotin CTP to generate labeled target. 

The advantage of this method is that sense RNA (mRN A) can be 
directly labeled by biotin CTP. Antisense RNA can also be labeled after 
fragmentation. The consumption of CTP can be cut down by l/5th compared to an 

20 IVT reaction. 

Example 19 

Direct Labeling Protocol 

Reagents for direct labeling mRNA 
25 1) 100^MrATP200^1 

198 nLDEPCH,0 
- ..7 .-jr. «>v. rA : 5> 

2) lOOjig/mlBSA 

30 3)30mML)M 

4) 10 U/^L polynucleotide kinase 

Boehringer Mannheim 3* phosphatase free cat # 83829 
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5) 1 nInole/^L BioA6 '^^^ 

Genetics Institute 

6) 5U/nL T4 RNA Ligase -f 10 X T4 RNA Ligase Buffer 

Epicentre Teclinologies, catalogue ^ LR5025 
5 7) 5 X RNA Fragmentation Buffer 

700 mM Tris-Acetate, pH 8.1 
500 mM KOAc 
ISOmMMgOAc 
Direct Labeling Protocol 
1 0 Fragmentation 

Add to a 1 .5 ml sterile tube 

8 \iL poly (Ar RNA in DEPC-H.O (1 ^g) 
2 5 X RNA Fragmentation Buffer 
Heat to 94°C for 35 minutes. 
1 5 Kinase Reaction 

Add to the 10 nL fragmented RNA: 
2.4 MLrATP(100 \iM) 
2^LBSA(100 ^g/ml) 
2 [il DTT (30 mM) 
20 1.6^LDEPC-H20 

2 polynucleotide kinase ( 1 0 U/fiL) 

Incubate at 37 °C for 2,5 hours. Heat to 94 °C for 2 minutes (heat kill 

enzyme). 

TV RNA Ligase Reaction 
25 Add to the 20 ^L kinased RNA: 

0.5 ^L BioA6 (1 nmole/^L in DEPC-HjO) 

3 ^L rATP(19mM) 

0.5 uL DEPC-H.O 
30 1 7^*0 overnight • 2 days. 94°C for 2 mir/jtes. 

Example 20 

Computer Algorithms to Perform Basecalling on a Target DNA 
Sample Hybridized or Ligated to Generic DNA Arrays, 
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Resequencing a DNA target by generating a sin>fn electronic tiling arrays on an 

n-mer generic DNA array. 

This method of resequencing the target is sinular to the method used 
with customized resequencing GeneChips except that unlike the custom GeneChips 
5 which physically place a single series of tiling probes on the chip, ^ith a generic 

GeneChip a computer electronically reconstructs a set of n tiling arrays by fetching 
the appropriate probe information from the generic array (a generic array contains a 
possible n-mer sequences). In general, to resequence a target DNA , the target is 
decomposed into an n-mer complement word spectrum of tiling probes. For each 
10 tiling probe, there exists a set of "first order nearest-neighbor" tiling probes (probes 
containing a single base substitution) on the generic chip (generic chips also contain 
higher order nearest neighbors). This process is termed tiling through the target 
sequence with n-mer words (Fig. 24). To make a basecall at a given position within 
the target, the intensity of the tiling probe at that position is compared to the 
15 intensities of its "nearest-neighbors" at that position. There are n sets of such 

"nearest-neighbors" because the single base substitution can occur at n different 
positions xvithin the probe. The base substitution at a particular position within the 

probe that yields the mgnest mtensuy uic u«ow ^^.^ — »- - 

probe (Fig. 25). The advantage of using a generic DNA array vs. the standard custom 
20 GeneChips is the high degree of redundancy achieved for each basecall of the target. 
An n-mer generic arrays makes n base calls for each base within the target whereas 
the custom resequencing GeneChips make only a single base call. 

The final basecall of a target base is decided upon by an electronic vote 
of the base calls from the n different electronic tilings at each target position (Fig. 26). 

25 

Efi^ericaily using the accuracy of the hasscalls derived from the n electronic iliing 
arrays to filter out inaccurate eiecironic tiiisigs. 

A given reference DNA sample is hybridized/ligated to a generic DNA 
30 RTTSy A set of n electronic tihngs are generated tl.c CG:rc3pcnd:r.g bicec-li? 

made. A coueetness score table is constructed by givini* a .core of 1 if a given tiling 
substitution series makes a correct basecall or a score of 0 if the basecall is incorrect 
(Fig. 27). A confidence level for a given basecall can also be attached to each scoring 
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according to the ratio of the intensities of the base'suEstitutions for any given basecall. 

A variant DNA sample is then hybridized^ligated to a second generic DNA array. 

Again a set of n electronic tilings are generated, except this tiixte all tilings are 
5 discarded which have a 0 correctness score, and only those tilings which have a 

correctness scare of I are included in the overall base voting procedure (Fig 28). The 
result is to dramatically improve the cverail percentage of correct basecails. 

Comparing "locally" normalized tiling probe intensities between a reference sample 

10 and a variant is a sensitive method of detecting a mutation. 

For a given n-mer generic array, the ability to correctly resequence a 
target decreases as the complexity of the target increases. As the target complexity 
increases, the niunber of n-mer tiling probes which repeat themselves within the target 
increases, the cross-talk between nearest neighbors at different positions increases, 

15 and the overall cross hybridization increases. All these factors contribute to miscalls 
of the bases within the target. The comparison of a sample target against a reference 
target provides a powerful way to "filter out" all the non-specific noise via difference 

One method of comparison between the reference and sample is to 
20 compare the intensities of the tiling probes themselves. However, before a direct 
comparison can be made, the intensities have to be normalized in some matter to 
account for both chip to chip and sample to sample variation. I employed a "local" 
normalization process to normalize the signals. By "local" nomialization, I simply 
divide the intensity of the tiling probe by the sum of the intensities of its nearest 
25 neighbors (Fig. 29). 

This method of normalization creates good signal tracking between samples and is 
quite sensiilvt; to die picscncc of a mutation indicated by the fcnr*ation of 2 "bubble" 
/cir* TTiC -^1" r.*'%^Tr;;^!.;7^t^r»ri tiJinc nrcbc ccmcanscn can be nirther 
transfomied by difference analysis and smoothing to a format where the presence of a 

Induced Difference method Jor detectmg mutations. 

Another method for using comparisons between a reference and a 
sample to detect mutations is via mutational "induced differences" between tilings 
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probes and their nearest neighbors. Applicaiion of this method to a first order nearest 
neighbor tiling analysis involves comparing "locally normalized" probes in the 
reference target to the corresponding probe in the sample target. Tilings that where 
uninfonnative in part II, because they miscalled Lhe base, may now be informative 
because certain probe members within that tiling can be induced (caused to increase or 
decrease in iatensit> ) between, the reference and the sample indicating the presence of 
a mutation (Fig 3 1 .) These inductions are summed over all the tilings on both the 
forward and reverse strand for a given target position, and the resultant number is a 
measure of whether a mutation is present or not (Fig. 32, Fig. 33). 



Example 21 

Use oflnosine on the 5 ends of the MenPoc synthesized probes to 
increase duplex stability and increase the resultant ligation signal on 

Generic Ligation GeneChips. 

We investigated the use of adding degenerate bases, such as inosine 
(pairs with all other bases), to the end of the MenPoc synthesized probes to increase 
duplex stability. We found that indeed, the addition of 1-6 inosines onto the end of 
the probes did in fact increase the signal intensity in both hybridization and ligation 
reactions on a Generic Ligation GeneChip and allowed us to ligate at higher 
20 temperatures. 

Inosines (0 -6) are placed at the 5" end of the probe during 
manufacturating, and the effects of these terminal inosines are assayed by ligating a 
DNAasel digested, TdT labeled 788 bp DNA fragment to the chips. The increased 
brighmess with 2 -6 inosines indicated an enhancement of duplex stability. With 6 
25 inosines there is a slight decrease in intensity compared to 2-4 inosines because the 
^ inosines are probably starting to form quartet-like secondary sttucTores. 



Example 22 

^ ^„v«., fh^ Specificity ofT4 fignsp and Taq ligase when 

used on a Generic Ligation GeneChip. 
We investigated whether T4 ligase or Taq ligase was more specific in 
ligating target to the Generic Ligation GeneChip. In order use Taq ligase, we need to 
perform the ligation reaction at 40 degrees C or higher. Consequently, we used an 8- 
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mer chip with 6 Inosines at the end of the MenPoc probes to increase the thermal 
stability of the duplexes. This allowed us to perform the Taq ligase reaction at 44 
degrees C and compare this to a T4 ligation reaction at 37 degrees C. Our results 
indicated that Taq is much more specific than T4 ligase, and ligates a set of target 
5 ends that T4 ligase is unable to iigate. 

Taq lights up fewer features but wiLh a brighter iniensir>' than T4 does 
indicating the specificitj- of Taq versus T4. 

Intensity profiles of the tiling probes and nearest neighbor substitutions 
at given probe positions within the target illustrate that Taq is more specific than T4 
10 and that Taq detects signal intensity at probes that T4 fails to detect signal. 

It is understood that the examples and embodiments described herein are for 
illustrative purposes only and that various modifications or changes in light thereof 
will be suggested to persons skilled in the art and are to be included vtithin the spirit 
and purview of this application and scope of the appended claims. All publications, 
15 patents, and patent applications cited herein are hereby incorporated by reference for 
all purposes. 
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WHAIISCLAIMEHIS: K^o 

1. A method of identifying differences in nucleic acid levels between 

tw'c or more nucleic acid samples, said method comprising the steps of: 

(a) providing one or more oligonucleotide r.rrays said arrays 
comprising probe oligonucleotides attached to a surface; 

(b) hybridizing said nucleic scid samples to said one or more 
anays to form hybrid duplexes between nucleic acids in said nucleic acid samples and 
probe oligonucleotides in said one or more arrays that are complementary to said 
nucleic acids or subsequences thereof; 

(c) contacting said one or more arrays with a nucleic acid 

ligase; and 

(d) determining differences in hybridization between said 
nucleic acid samples wherein said differences in hybridization indicate differences in 
said nucleic acid levels. 

2. The method of claim 1 , further comprising contacting said 
oligonucleotide arrays with one or more ligatable oligonucleotides. 

3. The method of claim 2, wherein said ligatable oligonucleotides are a 
20 pool of all possible oligonucleotides of a preselected length. 

4. The method of claim 2. wherein said determining comprises 
detecting one or more of said ligatable oligonucleotides attached to said array. 



15 



25 



5. The method of claim 1, wherein said one or more arrays is at least 
two arrays and said arrays are essentially the same in probe oligonucleotide 



conl|JU^ilt«^ll. 



6. The method ot claim 5, wherein the spailal Qiiai.gCi».cnt cf £a:d 
30 pr'-^'^ ' ponucleotides is essentially the same in said arrays. 

7. Tne method of clcim 1, wherein each ot .said nucieic acid samples is 
hybridized is to a different array, the different arrays having substantially the same 
probe oligonucleotide composition. 

35 



wo 97/27317 PCr/US97/01603 

8. The method of claim K wherein two or more of said nucleic acid 
samples are hybridized to a single oligonucleotide array. 

9. The method of claim 8, wherein said nucleic acid samples are 
5 simultaneousiy hybridized to a single oligonucleotide 

10. The method of claim L wherein said probe oligonucleotides are 
pairs of probe oligonucleotides that differ from each other in preselected nucleotides. 

10 11. The method of claim 10, wherein said pairs of probe 

oligonucleotides differ from each other in a single nucleotide. 

12. The method of claim 10, wherein said determining comprises 
determining the difference in sample nucleic acid hybridization intensity between the 

15 members of said pairs of probe oligonucleotides. 

1 3. A method of identifying differences in nucleic acid levels between 
two or more nucleic acid samples, said method comprising the steps of: 

(a) providing one or more oiigonucieotide arrays comprising 
20 probe oligonucleotides wherein said probe oligonucleotides comprise a constant 

region and a variable region; 

(b) hybridizing said nucleic acid samples to said one or more 
arrays to form hybrid duplexes between nucleic acids in said nucleic acid samples and 
said variable regions that are complementary to said nucleic acids or subsequences 

25 thereof; and 

(c) determining differences in hybridization between said 
nucleic acid samples wherein said differences in hybridization indicate differences in 
said nucleic acid levels 



30 14. The method of claim 13, wherein said variable region varies 

Isr.gth from zbziii 3 nucleotides to about 50 oligwuu^LuilLlc:*. 



15. The method of claim 13, wherein the variable regions of said 
probe oligonucleotides comprise all possible oligonucleotides of a preselected length. 



35 
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1 6. The method of claim 1 5, wherein said variable regions are at least 
5 nucleotides in length. 

1 7. The method of claim 13, wherein said constant region i-anges in 
5 length from 3 nucleotides to about 25 nucleotides. 

18. The method of claim 13, wherein said constant regions comprise a 
nucleotide sequence complementary to a sense or antisense sequence of the 
recognition site of a restriction endonuclease. 



10 
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19. The method of claim 13, fiirther comprising contacting said 
oligonucleotide arrays with a constant oligonucleotide complementary to said constant 
region or a subsequence thereof. 

20. The method of claim 19, comprising contacting said array with a 

ligase. 



21 . The method of claim 19, wherein said determining comprises 
delecting a nucleic acid of said nucleic acid samples attached to said constant 

20 oligonucleotide. 

22. The method of claim 13, wherein said probe oligonucleotides are 
pairs of probe oligonucleotides that differ firom each other in preselected nucleotides. 

25 23 . The method of claim 22, wherein said determining comprises 

determming the difference in sample nucleic acid hybridization intensity between the 
t — r ^ff^xA T»o;re of nrobe oliRonucleotides. 

24. A method ot identitymg oinerencei ni uucici^- a^iu iv,,w.o ^> 

^0 two or more nucleic acid samples, said method comprising Lhe steps of: 

{^d) ptUVlUlllg WllW 0» AkiVlw o~ - 

comprising paiK of probe oligcnuciectides where the members of each pair of probe 
oligonucleotides differ from each other in preselected nucleotides; 

(b) hybridizing said nucleic acid samples to said one or more 
35 arrays to form hybrid duplexes between nucleic acids in said nucleic acid samples and 
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probe oligonucleotides in said one or more'sfelys that are complementary' to said 
nucleic acids or subsequences thereof; 

(c) detennining the differences in hybridization between said 

nucleic acid samples wherein said differences in hybridization indicate differences in 
5 said nucleic acid levels. 

25. The method of claim 24. uherein said members of each pair of 
probe oligonucleotides differ from each other in a centrally located nucleotide. 

26. A method of identifying differences in nucleic acid levels between 
two or more nucleic acid samples, said method comprising the steps of: 

(a) providing one or more arrays of oligonucleotide arrays each 
array comprising more than 100 different probe oligonucleotides wherein: 

each different probe oligonucleotide is localized in a 
predetermined region of the array; 

each different probe oligonucleotide is attached to a surface 
through a terminal covaient bond; 

the density of said probe different oligonucleotides is greater 
than about 60 different oligonucleotides per i cm*; 

(b) hybridizing said nucleic acid samples to said one or more 
arrays to form hybrid duplexes between nucleic acids in said nucleic acid samples and 
probe oligonucleotides in said one or more arrays that are complementary to said 
nucleic acids or subsequences thereof; 

(c) determining the differences in hybridization between said 
nucleic acid samples wherein said differences in hybridization indicate differences in 
said nucleic acid levels. 

more oligonucleotide ?irrays with a ligase. 
30 

28. A method ot identitymg ditterences m nucieic acia levels between 
rwo or iriGiC nucleic acid samples, said method compnsmg the ^tpns ot* 

(a) providing one or more oligonucleotide arrays each 
comprising probe oligonucleotides wherein said probe oligonucleotides are not chosen 
35 to hybridize to nucleic acids derived from particular preselected genes or mRNAs; 
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(b) hybridizing said nucleic acid samples to said one or more 
arrays to form hybrid duplexes between nucleic acids in said nucleic acid samples and 
probe oligonucleotides in said one or more arrays that are complementar>' to said 
nucleic acids or subsequences tliereof: and 
5 (d) determining differences in hybridization between said 

nucleic acid samples wherein said differences in hybridization indicate differences in 
said nucleic acid levels. 

29. The method of claim 28, wherein said probe oligonucleotides are 
10 pairs of probe oligonucleotides that differ from each other in preselected nucleotides. 

30. The method of claim 29, wherein said determining comprises 
deiermimng the difference in sample nucleic acid hybridization intensity between the 
members of said pairs of probe oligonucleotides. 

31. A method of identifying differences in nucleic acid levels between 
two or more nucleic acid samples, said method comprising the steps of: 

(a) providing one or more oligonucleotide arrays each 
comprising probe oligonucleotides wherein said probe oligonucleotides comprise a 

20 nucleotide sequence or subsequences selected according to a process selected from the 
group consisting of a random selection, a haphazard selection, a nucleotide 
composition biased selection, and all possible oHgonucleotides of a preselected 
length; 

(b) hybridizing said nucleic acid samples to said one or more 
25 anays to form hybrid duplexes between nucleic acids in said nucleic acid samples and 

probe oligonucleotides in said one or more arrays that are complementary to said 

nucleic acxas or iut/DwiwwA.^ww , 

^c) Cetsnn^p^'^*^ (uiiciciiwwo in — ^^^'^ 
nucleic acid samples wherein said ditterences m nyona.^u. -.^-^^ - 

?ioid levels. 

^ J _r -,t«;«, -J 1 „.ue,-p.;n cj^id nucleotide sequence ui 
32. I rte meuiou oi *-t4*iw» ^ , v>ne — ^ 

nucleotide subsequences are all possible oligonucleotides of a preselected length 

selected from the group consisting of: all possible 6 mers, all possible 7 mers, all 
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possible 8 mers, all possible 9 mers, ail possible 10 mers, all possible 1 1 mers, and all 
possible 12 mers. 

33. A method of simultaneoiisly monitoring the expression of a 
5 multipiicit}' of genes, said method comprising: 

(a) providing a pool of target nucleic acids comprising RNA transcripts 
of one or more of said genes, or nucleic acids derived from said RJS A transcripts; 

(b) hybridizing said pool of nucleic acids to an oligonucleotide array 
comprising probe oligonucleotides immobilized on a surface; 

10 (c) contacting said oligonucleotide array with a ligase; and 

(d) quantifying the hybridization of said nucleic acids to said array 
wherein said quantifying provides a measure of the levels of transcription of said 
genes. 

15 34. The method of claim 33, wherein said probe oligonucleotides 

comprise nucleotide sequeces or nucleotide subsequences complementary to 
preselected RNA transcripts of one or more of said genes, or nucleic acids derived 
from said RNA transcripts. 

20 35. A method of simultaneously monitoring the expression of a 

multiplicity of genes, said method comprising: 

(a) providing one or more oligonucleotide arrays comprising 

probe oligonucleotides wherein said probe oligonucleotides comprise a constant 

region and a variable region; 
25 (b) providing a pool of target nucleic acids comprising RNA 

transcripts of one or more of said genes, or nucleic acids derived from said RNA 

transcripts; 

(c) hybridi7irip, *^-id poo! of nucleic acids to iiaid array 
oiigonuCiCGtiGc piOOcs; anG 
30 (d) quantifying the hybridization of .said nucleic acids tc said 

oiiay iviieieiki sdiu quatiufyiirtg piuviue^ a measure of the ieveis of iranscnption ot said 
Ktnes. 

36. The method of claim 35, wherein said probe oligonucleotides 
35 comprise nucleotide sequeces or nucleotide subsequences complementary to 
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preselected RNA transcripts of one or more of said genes, or nucleic acids derived 
from said RNA transcripts. 

37. A method of making a nucleic acid array for identif\'ing 
5 differences in nucleic acid levels between two or more nucleic acid samples, said 

method comprising the steps of: 

(a) providing an oiigonucieotide array comprising probe 
oligonucleotides wherein said probe oligonucleotides comprise a constant region and 
a variable region; 

(b) hybridizing one or more of said nucleic acid samples to said 
arrays to form hybrid duplexes of said variable region and nucleic acids in said nucleic 
acid samples comprising subsequences complementary to said variable region; 

(c) attaching the sample nucleic acids comprising said hybrid 
duplexes to said array of probe oligonucleotides; and 

(d) removing unattached nucleic acids to provide a high density 
oligonucleotide array bearing sample nucleic acids attached to said array. 



15 



38. A method of making a nucleic acid array for identifying 
differences in nucleic acid levels between two or more nucleic acid samples, said 

20 method comprising the steps of: 

(a) providing an array comprising more than 100 different 

probe oligonucleotides wherein: 

each different probe oligonucleotide is localized in a 

predetermined region of the array; 
25 each different probe oligonucleotide is attached to a surface 

through a terminal covalent bond; 

the density ol.said probe different oligonucleotides is greater 

, 1 . y/v Ji:rr * »*^i/arxtirtoe r»f»r \ rrn * 

Tr^«n ghoui Ut; uillcjwiit Oii^wnww*^w^»^wO ^ — - » 

(b) contactmg said array on^ o\ morw of said two or nicre 
^0 nucleic acid samples wtiereby nucieic auiu=» uJ Da-a ^..w s.. .-,3 u.^ ^ ^ 

acid samples torm nyonu aupitAC;* wiui p.^^w ^..^ , . 

^*^^r^u\r>n tv.** co*vir»iR nucleic acicls comorisinG said hybrid 

duplexes to said airay of probe oligonucleotides; and 

(d) removing unattached nucleic acids to provide a high density 
35 oligonucleotide array bearing sample nucleic acids attached to said array. 
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39. A kit for identifying differences in nucleic acid levels between two 
or more nucleic acid samples, said kit comprising: 

a container containing one or more oligonucleotide arrays said 

arrays comprising probe oligonucleotides attached to a surface: and 
5 a container containing a ligase. 

40. A kit for identifying differences in nucleic acid levels between rs\'o 
or more nucleic acid samples, said kit comprising: 

a container containing one or more oligonucleotide arrays said 
10 arrays comprising probe oligonucleotides wherein said probe oligonucleotides 
comprise a constant region and a variable region/ 

41 . The kit of claim 40, further comprising a constant oligonucleotide 
complementary to said constant region or a subsequence thereof 

15 

42. A method of labeling a nucleic acid, said method comprising the 

steps of: 

(a) providing a nucleic acid; 

(b) amplifying said nucleic acid to form amplicons; 

20 (c) fragmenting said amplicons to form fragments of said 

amplicons; and (d) coupling a labeled moiety to at least one of 

said fragments. 

43. A method of labeling a nucleic acid, said method comprising the 

25 steps of: 

(a) providing a nucleic acid; 

(b) transcribing said nucleic acid to fonned a transcribed 

(c) fragmenting said t^ansc^bed nucleic acid to form fragments 
30 of said transcribed nucleic acid: and 

(d) coupling a labeled moiety to at least one ot said tragments. 



44. A method of labeling a nucleic acid comprising the steps of: 
35 (a) providing at least one nucleic acid coupled to a support; 
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(b) providing a labeled moiety capable of being coupled with a 
terminal transferase to said nucleic acid; 

(c) providing said terminal transferase; and 

(d) coupling said labeled moiety to said nucleic acid using said 

5 terminal transferase. 

44. A method of labeling a nucleic acid comprising the steps of: 

(a) providing at least two nucleic acids coupled to a support; 

(b) increasing the number of monomer units of said nucleic 
10 acids to form a common nucleic acid tail on said at least two nucleic acids; 

(c) providing a labeled moiety capable of recognizing said 

common nucleic acid tails; and 

(d) contacting said conunon nucleic acid tails and said labeled 

moiety. 

15 

45. A method of labeling a nucleic acid comprising the steps of: 

(a) providing at least one nucleic acid coupled to a support; 

(b) providing a labeled moiety capable of being coupled with a 

20 ligase to said nucleic acid; 

(c) providing said ligase; and 

(d) coupling said labeled moiety to said nucleic acid using said 

ligase. 



46. A compound having the formula: 



30 




^1 fi 



wherein Rl is hydrogen, hydroxyl, a phosphate linkage, or a phosphate group; 
R2 is hydrogen or hydroxyl; 

R3 is hydrogen, hydroxyl, a phosphate linkage, or a phosphate group; and 
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47. A compound having the foimula: 



5 



10 




wherein Rl is hydrogen, hydroxyl, a phosphate linkage, or a phosphate group; 
R2 is hydrogen or hydroxyl; 

R3 is hydrogen, hydroxyl, a phosphate linkage, or a phosphate group; and 
R4 is a coupled labeled moiety. 

15 

48. A method of identifying differences in nucleic acid levels between 
two or more nucleic acid samples, said method comprising the steps of: 

(a) providing one or more oligonucleotide arrays each 
comprising probe oligonucleotides wherein said probe oligonucleotides comprise a 

20 nucleotide sequence or subsequences selected according to a process selected from the 
group consisting of a random selection, a haphazard selection, a nucleotide 
composition biased selection, and all possible oligonucleotides of a preselected 
length; 

(b) providing software describing the location and sequence of 
25 probe oligonucleotides on said array; 

(c) hybridi2dng said nucleic acid samples to said one or more 
arrays to form hybrid duplexes between nucleic acids in said nucleic acid samples and 
probe oiigcnucicctiCics in s^g one or morc airsys iiiat arc conipiCrnGruary" to saiu 
nucleic acids cr subse'^iiences therecf* 

30 (d) operaiir»2 software such that s;iid hynriHi^ing indicates 

differences in said nucleic acid levels. 
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Fig. 14a 
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Fig. 14c 
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Fig. 15d Resficticn digest PGR products 
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Fig. 23^ 
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Fig^Y Resequeneing a target DNA molecule with a set of generic n-mar tiling 
probes. 
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F\g. 2S Effect of applying correctness score transform to HIV data. 
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Fig -^Mutation Detection by Intensity Comparisons 
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Fig. 3'^ Bubble Formation detection of mutation in HIV genome 
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Fig 3/ Induced Difference Nearest Neighbor Probe Scoring: 
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Fig,3:^ Mutations found in an HIV PGR target (B) using a Generic 
Ligation GeneChip and Induced Difference Analysis 
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